Equivalent first order optimality conditions over Stiefel manifold

This post is a short summary of equivalent first order optimality conditions collected from several papers for the smooth optimization problem over Stiefel manifold. Throughout this post, we consider the optimization problem \( \min f(X) \), s.t. \( X \in \mathbf{St}(n,r) \), where \( X \in \mathbb{R}^{n \times r} \) and \( f \) is smooth. Let \( \nabla f(X) \) be the Euclidean gradient of \( f \) at \( X \). The following statements are equivalent:

\( X \) is a first-order stationary point of the problem.
\( \operatorname{grad} f(X) \triangleq \nabla f(X) - \frac{X}{2} (X^{\top}\nabla f(X) + \nabla f(X)^{\top}X) = 0 \) (by definition).
\( (I - XX^{\top}) \nabla f(X) = 0 \) and \( X^{\top} \nabla f(X) = \nabla f(X)^{\top}X \) (Gao et al. 2018).
\( \nabla f(X) - X \nabla f(X)^{\top}X = 0 \) (Wen and Yin 2013; Gao et al. 2018).
\( \nabla f(X) X^{\top} = X \nabla f(X)^{\top} \) (Wen and Yin 2013).
\(\nabla f(X) - X(2\rho \nabla f(X)^{\top} X + (1-2\rho) X^{\top} \nabla f(X) ) = 0\), where \(\rho > 0\) is a constant (Jiang and Dai 2015).

Note that the above conditions implies that \( \nabla f(X)^{\top}X = X^{\top} \nabla f(X) \), that is, it is a necessary condition. When \( n = r \) (the feasible region is an orthogonal group), this condition becomes an equivalent condition.

References

Gao, Bin, Xin Liu, Xiaojun Chen, and Ya-xiang Yuan. 2018. “A New First-Order Algorithmic Framework for Optimization Problems with Orthogonality Constraints.” Siam Journal on Optimization 28 (1): 302–32. https://doi.org/10.1137/16M1098759.

Jiang, Bo, and Yu-Hong Dai. 2015. “A Framework of Constraint Preserving Update Schemes for Optimization on Stiefel Manifold.” Mathematical Programming 153 (2): 535–75. https://doi.org/10.1007/s10107-014-0816-7.

Wen, Zaiwen, and Wotao Yin. 2013. “A Feasible Method for Optimization with Orthogonality Constraints.” Mathematical Programming 142 (1-2): 397–434. https://doi.org/10.1007/s10107-012-0584-1.