In this post, we will give some explicit formulas of the projection of a matrix \( Y \in \mathbb{R}^{n \times p} \) onto Stiefel manifold \( \St(p, n) \). Moreover, the Lipschitz continuity of the projection will be discussed.
Projection of a full column rank matrix
We firstly consider the case when \( Y \) is of full column rank. We will show the projection of \( Y \) is \( Y(Y^{\top}Y)^{-1/2} \).
Consider the optimization problem over the given Stiefel manifold \[ \min \| X - Y \|^2, \mbox{ s.t. } X^{\top}X = I_{p}. \] Let \( X_{*} \) be the projection of \( Y \). Then \( X_{*} \) is the global solution of this problem. So it satisfies First Optimality Condition over Stiefel Manifold: \( \nabla f(X_{*}) - X_{*} \nabla f(X_*)^{\top}X_{*} = 0 \). Note that \( \nabla f(X_{*}) = X_{*} - Y \). We have \[ 0 = \nabla f(X_{*}) - X_{*} \nabla f(X_{*})^{\top}X_{*} = X_{*} - Y - X_{*} (X_{*}^{\top} - Y^{\top})X_{*} = X_{*} - Y - X_{*} + X_{*}Y^{\top}X_{*}, \] which implies
\begin{equation}\label{eq-projection-rankp-relation-X-Y} Y = X_{*}Y^{\top}X_{*}. \end{equation}
Multiplying \( Y^{\top} \) on both sides, we have
\begin{equation}\label{eq-2} Y^{\top}Y = Y^{\top}X_{*} Y^{\top}X_{*} = (Y^{\top}X_{*})^2. \end{equation}
Multiplying \( X_{*}^{\top} \) on both sides, we see that \[ X_{*}^{\top}Y = X_{*}^{\top}X_{*}Y^{\top}X_{*} = Y^{\top}X_{*}. \] It follows that \( Y^{\top}X_{*} \) is symmetric. Thus, \( Y^{\top}X_{*} = (Y^{\top}Y)^{1/2} \) by \eqref{eq-2}. Now we plug it into \eqref{eq-projection-rankp-relation-X-Y} to obtain \[ Y = X_{*} (Y^{\top}X_{*}) = X_{*} (Y^{\top}Y)^{1/2}. \] Thus, \( X_{*} = Y(Y^{\top}Y)^{-1/2} \). The proof is complete.
Projection by SVD
Let \( Y = U\Lambda V^{\top} \) be the SVD decomposition of \( Y \), where \( U \in \mathbb{R}^{n \times p} \) and \(\Lambda, V \in \mathbb{R}^{p \times p}\). Then \( UV^{\top} \) is the projection of \( Y \) onto the Stiefel manifold. The proof is as follows. For any \( X \) in Stiefel manifold, \[ \langle Y, X \rangle= \langle U\Lambda V^{\top}, X \rangle = \langle \Lambda, U^{\top}XV \rangle. \] Note that \(\Lambda\) is diagonal with non-negative diagonal elememnts and \( U^{\top}XV \) is in Stiefel manifold. We have \[ \langle \Lambda, U^{\top}XV \rangle \leq \sum_{i=1}^n (\Lambda_{i,i}), \] where the equality holds when \( U^{\top}XV = I_p \). Thus, \[ \| Y - X \|^2 = \| Y \|^2 + \| X \|^2 - 2 \langle Y, X \rangle \geq \| Y \|^2 + p - 2 \sum_{i=1}^p \Lambda_{i,i}, \] where the equality holds when \( X = UV^{\top} \).
Projection by polar decomposition
Finally, we show that \( U \) is the projection of \( Y \) if the polar decomposition of \( Y \) is \( Y = UP \), where \( P \in \mathbb{R}^{p \times p} \) is positive semi-definite. It is sufficient to show for any other matrix \( V \) in the Stiefel manifold, we have \[ \| UP - U \|^2 \ge \| UP - V \|^2, \] which is equivalent to \[ \langle U - V, UP \rangle \geq 0. \] Actually, we can show that \[ \langle U - V, UP \rangle \geq \frac{1}{2} \sigma_{\min}(P) \| U - V \|^2. \] Let \( P = Q^{\top}\Lambda Q \) be the spectral decomposition of \( P \), where \(\Lambda = \operatorname{\lambda_1 , \ldots , \lambda_p}\). Then \[ \langle U - V, UP \rangle = \langle U - V, UQ^{\top} \Lambda Q \rangle = \langle UQ^{\top} - VQ^{\top}, UQ^{\top} \Lambda \rangle. \] Suppose \( UQ^{\top} = [u_1 , \ldots , u_p] \) and \( VQ^{\top} = [v_1 , \ldots , v_p] \). It follows that \[ \langle UQ^{\top} - VQ^{\top}, UQ^{\top}\Lambda \rangle = \sum_{i=1}^p \langle \lambda_i(u_i - v_i), u_i \rangle \geq \min_k \lambda_k \frac{1}{2}(\sum_{i=1}^p \| u_i - v_i \|^2 ), \] where the last inequality follows from \(\| u_i \| = \| v_i \| = 1, i=1 , \ldots , p\).
Moreover, from the inequality we can see that the projection and polar decomposition of \( Y \) are unique if \( Y \) is of full column rank since \(\sigma_{\min}(P) > 0\) in this case.
Also, it can be proved that the polar factor \( U \) is the projection of \( Y \) under any unitarily invariant norm. Check orthogonality - Projection onto the Stiefel manifold and the orthogonal Procr… for discussions about it.
Lipschitz continuity of projection
No global Lipschitz constant
The projection onto Stiefel manifold has no global Lipschitz constant. For example, consider two points \( x = \epsilon, y = -\epsilon \) where \( \epsilon > 0 \). Denote the projection map onto Stiefel manifold by \(\pi\). then \(\pi(x) = 1\) and \(\pi(y) = -1\). If there exists a constant \( L > 0 \) such that \[ \| \pi(x) - \pi(y) \| \leq L \| x - y \|, \] then \( L \geq 1 / \epsilon \). Since \(\epsilon\) can be arbitrarily close to \( 0 \), such constant \( L \) does not exist.
Local Lipschitz constant
Fortunately, Lipschitz continuity can be established by assuming the singular values of points have a positive lower bound. Let \( \| \cdot \| \) denote an unitary invariant norm. Especially, it can be Frobenius norm or spectral norm.
Square case
When \( n = p \), for any two nonsingualr matrices \( A, B \), we have \[ \| \pi(A) - \pi(B) \| \leq \frac{2}{\sigma_{\min}(A) + \sigma_{\min}(B)}\| A - B \|. \] See Theorem VII.5.1 in (Bhatia 1997) and Theorem 1 in (Li 1995) for more details.
Non-square case
When \( n > p \), there are multiple bounds; see Theorem 2 and Remark 4 in (Li 1995). Here we list some of them here.
- \( \frac{2}{\sigma_{\min}(A) + \sigma_{\min}(B)} + \frac{1}{\max\{ \sigma_{\min}(A), \sigma_{\min}(B) \}} \). In particular, this means there exists a uniform Lipschitz constant when \(\max\{ \sigma_{\min}(A), \sigma_{\min}(B) \}\) has a positive lower bound.
- \( \sqrt{(\frac{2}{\sigma_{\min}(A) + \sigma_{\min}(B)})^2 + (\frac{1}{\max\{ \sigma_{\min}(A), \sigma_{\min}(B) \}})^2} \) when \( \| \cdot \| \) denotes Frobenius norm.
- \( \frac{1}{\min\{ \sigma_{\min}(A), \sigma_{min}(B) \}} \).