3.4_奇异值分解

3.4 奇异值分解

奇异值分解 (Singular Value Decomposition, SVD) 分解是矩阵计算中非常有用的工具之一, 在图像处理、机器学习、数据科学等领域有着非常重要的应用.

3.4.1 奇异值,奇异向量和奇异值分解

ACm×nA \in \mathbb{C}^{m \times n} ( mnm \geq n ), 则 AACn×nA^{*}A \in \mathbb{C}^{n \times n}AACm×mAA^{*} \in \mathbb{C}^{m \times m} 都是Hermite半正定矩阵, 且它们具有相同的非零特征值 (都是正实数).

定理3.12 (SVD) [57] 设 ACm×nA \in \mathbb{C}^{m \times n} ( mnm \geq n ), 则存在酉矩阵 UCm×mU \in \mathbb{C}^{m \times m}VCn×nV \in \mathbb{C}^{n \times n} 使得

UAV=[Σ0]A=U[Σ0]V,(3.11)U ^ {*} A V = {\left[ \begin{array}{l} {\Sigma} \\ {0} \end{array} \right]} \quad {\text {或}} \quad A = U {\left[ \begin{array}{l} {\Sigma} \\ {0} \end{array} \right]} V ^ {*}, \qquad \qquad (3. 1 1)

其中 Σ=diag(σ1,σ2,,σn)Rn×n\Sigma = \mathrm{diag}(\sigma_1, \sigma_2, \dots, \sigma_n) \in \mathbb{R}^{n \times n} , 且 σ1σ2σn0\sigma_1 \geq \sigma_2 \geq \dots \geq \sigma_n \geq 0 . 分解 (3.11) 称为 AA 的奇异值分解 (SVD), 而 σ1,σ2,,σn\sigma_1, \sigma_2, \dots, \sigma_n 则称为 AA 的奇异值. (板书)

证明. 首先假设 A0A \neq 0 , 否则只需令 Σ=0\Sigma = 0 即可, UUVV 可以是任意酉矩阵.

下面我们对 mmnn 用归纳法来证明

n=1,m1n = 1, m \geq 1 时, 我们取 Σ=A2,V=1,UCm×m\Sigma = \|A\|_2, V = 1, U \in \mathbb{C}^{m \times m} 是第一列为 u1=A/A2u_1 = A / \|A\|_2 的酉矩阵, 于是

A=U[Σ0]VA = U \left[ \begin{array}{c} \Sigma \\ 0 \end{array} \right] V ^ {*}

即为 AA 的奇异值分解

假设 C(m1)×(n1)\mathbb{C}^{(m - 1)\times (n - 1)} 中的矩阵都存在奇异值分解, 下面证明 ACm×nA\in \mathbb{C}^{m\times n} 也存在有奇异值分解.由2-范数的定义可知, 存在向量 vCnv\in \mathbb{C}^n 满足 v2=1\| v\| _2 = 1 使得 A2=Av2.\| A\| _2 = \| Av\| _2.

u=1σAvCm,其 中σ=A2,u = \frac {1}{\sigma} A v \in \mathbb {C} ^ {m}, \quad \text {其 中} \sigma = \| A \| _ {2},

u2=1\| u\| _2 = 1 .我们将 vvuu 都扩充成酉矩阵,即存在 U~Cm×(m1)\tilde{U}\in \mathbb{C}^{m\times (m - 1)}V~Cn×(n1)\tilde{V}\in \mathbb{C}^{n\times (n - 1)} ,使得 [u,U~]Cm×m[u,\tilde{U} ]\in \mathbb{C}^{m\times m}[v,V~]Cn×n[v,\tilde{V} ]\in \mathbb{C}^{n\times n} 都是酉矩阵.于是

U~Av=U~(σu)=0,uAv=u(σu)=σ.\tilde {U} ^ {*} A v = \tilde {U} ^ {*} (\sigma u) = 0, \quad u ^ {*} A v = u ^ {*} (\sigma u) = \sigma .

所以

A~[u,U~]A[v,V~]=[uAvuAV~U~AvU~AV~]=[σuAV~0U~AV~].\tilde {A} \triangleq \left[ u, \tilde {U} \right] ^ {*} A \left[ v, \tilde {V} \right] = \left[ \begin{array}{c c} u ^ {*} A v & u ^ {*} A \tilde {V} \\ \tilde {U} ^ {*} A v & \tilde {U} ^ {*} A \tilde {V} \end{array} \right] = \left[ \begin{array}{c c} \sigma & u ^ {*} A \tilde {V} \\ 0 & \tilde {U} ^ {*} A \tilde {V} \end{array} \right].

σ=A2=A~2=A~2A~e12=[σ,uAV~]2=σ2+uAV~22,\sigma = \| A \| _ {2} = \left\| \tilde {A} \right\| _ {2} = \left\| \tilde {A} ^ {*} \right\| _ {2} \geq \left\| \tilde {A} ^ {*} e _ {1} \right\| _ {2} = \left\| \left[ \sigma , u ^ {*} A \tilde {V} \right] ^ {*} \right\| _ {2} = \sqrt {\sigma^ {2} + \| u ^ {*} A \tilde {V} \| _ {2} ^ {2}},

所以 uAV~2=0\| u^{*}A\tilde{V}\|_{2} = 0 ,即 uAV~=0.u^{*}A\tilde{V} = 0. 于是

[u,U~]A[v,V~]=[σ00A1],\left[ u, \tilde {U} \right] ^ {*} A \left[ v, \tilde {V} \right] = \left[ \begin{array}{c c} \sigma & 0 \\ 0 & A _ {1} \end{array} \right],

其中 A1=U~AV~C(m1)×(n1)A_{1} = \tilde{U}^{*}A\tilde{V}\in \mathbb{C}^{(m - 1)\times (n - 1)} .由归纳假设可知, A1A_{1} 存在奇异值分解,设为

A1=U1[Σ10]V1,A _ {1} = U _ {1} \left[ \begin{array}{c} \Sigma_ {1} \\ 0 \end{array} \right] V _ {1} ^ {*},

其中 U1C(m1)×(m1)U_{1}\in \mathbb{C}^{(m - 1)\times (m - 1)}V1C(n1)×(n1)V_{1}\in \mathbb{C}^{(n - 1)\times (n - 1)} 都是酉矩阵, Σ1R(n1)×(n1)\Sigma_1\in \mathbb{R}^{(n - 1)\times (n - 1)} 是对角矩阵,且其对角线元素按降序排列.令

U=[u,U~][100U1],V=[v,V~][100V1],U = \left[ u, \tilde {U} \right] \left[ \begin{array}{c c} 1 & 0 \\ 0 & U _ {1} \end{array} \right], \quad V = \left[ v, \tilde {V} \right] \left[ \begin{array}{c c} 1 & 0 \\ 0 & V _ {1} \end{array} \right],

UCm×mU \in \mathbb{C}^{m \times m}VCn×nV \in \mathbb{C}^{n \times n} 都是酉矩阵, 且

UAV=[100U1][u,U~]A[v,V~][100V1]=[100U1][σ00A1][100V1]=[σ00U1A1V1]=[Σ0],(3.12)\begin{array}{l} U ^ {*} A V = \left[ \begin{array}{c c} 1 & 0 \\ 0 & U _ {1} \end{array} \right] ^ {*} \left[ \begin{array}{c} u, \tilde {U} \end{array} \right] ^ {*} A \left[ \begin{array}{c} v, \tilde {V} \end{array} \right] \left[ \begin{array}{c c} 1 & 0 \\ 0 & V _ {1} \end{array} \right] \\ = \left[ \begin{array}{c c} 1 & 0 \\ 0 & U _ {1} \end{array} \right] ^ {*} \left[ \begin{array}{c c} \sigma & 0 \\ 0 & A _ {1} \end{array} \right] \left[ \begin{array}{c c} 1 & 0 \\ 0 & V _ {1} \end{array} \right] \\ = \left[ \begin{array}{c c} \sigma & 0 \\ 0 & U _ {1} ^ {*} A _ {1} V _ {1} \end{array} \right] = \left[ \begin{array}{l} \Sigma \\ 0 \end{array} \right], \tag {3.12} \\ \end{array}

其中 Σ=[σ00Σ1]\Sigma = \begin{bmatrix} \sigma & 0 \\ 0 & \Sigma_1 \end{bmatrix} . 又

σ2=A22=UAV22=[Σ0]22=ρ([σ200Σ12]),\sigma^ {2} = \| A \| _ {2} ^ {2} = \| U ^ {*} A V \| _ {2} ^ {2} = \left\| \left[ \begin{array}{c} \Sigma \\ 0 \end{array} \right] \right\| _ {2} ^ {2} = \rho \left(\left[ \begin{array}{c c} \sigma^ {2} & 0 \\ 0 & \Sigma_ {1} ^ {2} \end{array} \right]\right),

所以 σ\sigma 不小于 Σ1\Sigma_{1} 中的所有对角线元素, 即 Σ\Sigma 的对角线元素也是按降序排列. 因此, (??) 这就是 AA 的奇异值分解.

由归纳法可知, 定理的结论成立.

该定理也可以通过Hermite半正定矩阵的特征值分解来证明
如果 ARm×nA\in \mathbb{R}^{m\times n} 是实矩阵,则 U,VU,V 也都可以是实矩阵[73].

由(3.11)可知,

AA=VΣΣV,AA=U[ΣΣ000]U.A ^ {*} A = V \Sigma^ {*} \Sigma V ^ {*}, \quad A A ^ {*} = U \left[ \begin{array}{c c} \Sigma \Sigma^ {*} & 0 \\ 0 & 0 \end{array} \right] U ^ {*}.

所以 σ12,σ22,,σn2\sigma_1^2, \sigma_2^2, \ldots, \sigma_n^2AAA^* AAAAA^* 的特征值. 因此, AA 的奇异值就是 AAA^* A 的特征值的平方根.

奇异向量

由SVD(3.11)可知

Avi=σiui,i=1,2,,n,A v _ {i} = \sigma_ {i} u _ {i}, \quad i = 1, 2, \dots , n,
Aui=σivi,i=1,2,,n,A ^ {*} u _ {i} = \sigma_ {i} v _ {i}, \quad i = 1, 2, \dots , n,
Aui=0,i=n+1,n+2,,m.A ^ {*} u _ {i} = 0, \quad i = n + 1, n + 2, \ldots , m.

我们称 U=[u1,u2,,um]U = [u_{1}, u_{2}, \ldots, u_{m}]V=[v1,v2,,vn]V = [v_{1}, v_{2}, \ldots, v_{n}] 的列向量分别称为 AA 的左奇异向量和右奇

异向量.

细SVD(降阶SVD)

由定理3.12可知

A=U[Σ0]V=σ1u1v1+σ2u2v2++σnunvn=i=1nσiuivi.A = U \left[ \begin{array}{c} \Sigma \\ 0 \end{array} \right] V ^ {*} = \sigma_ {1} u _ {1} v _ {1} ^ {*} + \sigma_ {2} u _ {2} v _ {2} ^ {*} + \dots + \sigma_ {n} u _ {n} v _ {n} ^ {*} = \sum_ {i = 1} ^ {n} \sigma_ {i} u _ {i} v _ {i} ^ {*}.

Un=[u1,u2,,un]Cm×nU_{n} = [u_{1},u_{2},\ldots ,u_{n}]\in \mathbb{C}^{m\times n} ,则 UnU_{n} 是单位列正交矩阵(即 UnUn=In×n)U_{n}^{*}U_{n} = I_{n\times n}) ,且

A=UnΣV.(3.13)A = U _ {n} \Sigma V ^ {*}. \tag {3.13}

这就是所谓的细SVD(或瘦SVD,thinSVD,skinnySVD)[57]或降阶SVD(reducedSVD)[125],有的文献将(3.12)称为奇异值分解.

压缩SVD

rrank(A)nr \triangleq \operatorname{rank}(A) \leq n , 则有

σ1σ2σr>0,σr+1==σn=0.\sigma_ {1} \geq \sigma_ {2} \geq \dots \geq \sigma_ {r} > 0, \quad \sigma_ {r + 1} = \dots = \sigma_ {n} = 0.

我们称

Ar=i=1rσiuiviA _ {r} = \sum_ {i = 1} ^ {r} \sigma_ {i} u _ {i} v _ {i} ^ {*}

AA 的压缩SVD (condensedSVD).

截断SVD

k<nk < n ,我们称

Ak=σ1u1v1+σ2u2v2++σkukvk=i=1kσiuiviA _ {k} = \sigma_ {1} u _ {1} v _ {1} ^ {*} + \sigma_ {2} u _ {2} v _ {2} ^ {*} + \dots + \sigma_ {k} u _ {k} v _ {k} ^ {*} = \sum_ {i = 1} ^ {k} \sigma_ {i} u _ {i} v _ {i} ^ {*}

AA 的截断SVD (truncated SVD).

由于压缩SVD和截断SVD在计算和存储方面存在优势,因此实际应用中经常使用的是压缩SVD或截断SVD.

奇异值的应用

SVD 可用来计算矩阵的低秩逼近, 矩阵行空间和列空间的一组正交基, 以及矩阵的秩.

  • 矩阵计算:矩阵范数,矩阵条件数,最小二乘问题,广义逆,矩阵和张量的低秩分解,等等.

  • 工程应用: 信号处理, 图像压缩, 机器学习, 主成分分析, 数据降维, 等等.

Six Great Theorems of Linear Algebra

  • Dimension Theorem: All bases for a vector space have the same number of vectors.

  • Counting Theorem: Dimension of column space + dimension of nullspace = number of columns.

  • Rank Theorem: Dimension of column space == dimension of rowspace. This is the rank.

  • Fundamental Theorem: The row space and nullspace of AA are orthogonal complements in Rn\mathbb{R}^n .

  • SVD: There are orthonormal bases ( vv 's and uu 's for the row and column spaces) so that Avi=σiuiAv_{i} = \sigma_{i}u_{i} .

  • Spectral Theorem: If AT=AA^{\mathsf{T}} = A there are orthonormal qq 's so that Aqi=λiqiAq_{i} = \lambda_{i}q_{i} and A=QΛQTA = Q\Lambda Q^{\mathsf{T}} .

— Introduction to Linear Algebra, 5th, G. Strang, 2016.

3.4.2 奇异值基本性质

下面是关于奇异值的一些基本性质:

定理3.13 设 A=U[Σ0]VA = U\left[ \begin{array}{c}\Sigma \\ 0 \end{array} \right]V^{*}ACm×nA\in \mathbb{C}^{m\times n} ( mnm\geq n ) 的奇异值分解, 则下面结论成立:

(1) AAA^{*}A 的特征值是 σi2\sigma_{i}^{2} , 对应的特征向量是 vi,i=1,2,,nv_{i}, i = 1,2,\ldots,n ;
(2) AAAA^{*} 的特征值是 σi2\sigma_{i}^{2}mnm - n 个零, 对应的特征向量是 ui,i=1,2,,mu_{i}, i = 1,2,\ldots,m ;
(3) A2=σ1\| A\| _2 = \sigma_1 AF=σ12+σ22++σn2;\| A\| _F = \sqrt{\sigma_1^2 + \sigma_2^2 + \cdots + \sigma_n^2};
(4) 若 rank(A)=rn\operatorname{rank}(A) = r \leq n , 则

Ran(A)=span{u1,u2,,ur},Ker(A)=span{vr+1,vr+2,,vn};\operatorname {R a n} (A) = \operatorname {s p a n} \left\{u _ {1}, u _ {2}, \dots , u _ {r} \right\}, \quad \operatorname {K e r} (A) = \operatorname {s p a n} \left\{v _ {r + 1}, v _ {r + 2}, \dots , v _ {n} \right\};

(5) 设 xCnx \in \mathbb{C}^nx2=1\| x \|_2 = 1 , 则 σnAx2σ1\sigma_n \leq \| Ax \|_2 \leq \sigma_1 ;
(事实上有 σ1=maxx2=1Ax2,σn=minx2=1Ax2)\sigma_{1} = \max_{\| x\|_{2} = 1}\| Ax\|_{2},\sigma_{n} = \min_{\| x\|_{2} = 1}\| Ax\|_{2})
(6) (酉不变性) 设 XCm×mX \in \mathbb{C}^{m \times m}YCn×nY \in \mathbb{C}^{n \times n} 是酉矩阵, 则 σi(XAY)=σi(A)\sigma_i(X^*AY) = \sigma_i(A) .

(留作练习)

性质 (4) 可以用来计算矩阵的像空间和零空间.

定理3.14 设 A=UΣVA = U\Sigma V^{*}ACn×nA\in \mathbb{C}^{n\times n} 的奇异值分解, 则下面结论成立:

(1) det(A)=σ1σ2σn;|\operatorname{det}(A)| = \sigma_1\sigma_2\cdots \sigma_n;
(2) 若 AA 非奇异, 则 A12=σn1,κ2(A)=σ1/σn\| A^{-1}\|_2 = \sigma_n^{-1}, \kappa_2(A) = \sigma_1 / \sigma_n ;
(3) 若 AA 是Hermite的, 且 A=UΛUA = U\Lambda U^{*}AA 的酉特征值分解, 即 UU=I,Λ=diag(λ1,λ2,,λn)U^{*}U = I, \Lambda = \mathrm{diag}(\lambda_{1}, \lambda_{2}, \ldots, \lambda_{n}) . 设 λ1λ2λn|\lambda_1| \geq |\lambda_2| \geq \dots \geq |\lambda_n| , 则 A=UΣVA = U\Sigma VAA 的奇异值分解, 其中 σi=λi\sigma_{i} = |\lambda_{i}| , vi=sign(λi)uiv_{i} = \mathrm{sign}(\lambda_{i})u_{i} , 若 λi=0\lambda_{i} = 0 , 则取 vi=uiv_{i} = u_{i} ;
(4) 矩阵 H=[0AA0]H = \begin{bmatrix} 0 & A^{*}\\ A & 0 \end{bmatrix} 的特征值是 ±σi\pm \sigma_{i} , 对应的单位特征向量为 12[vi±ui]\frac{1}{\sqrt{2}}\left[ \begin{array}{l}v_{i}\\ \pm u_{i} \end{array} \right] . ( ACm×nA\in \mathbb{C}^{m\times n} 时也有类似结论)

(留作练习)

例3.5 由奇异值的性质可知, 矩阵的谱条件数取决于最大奇异值和最小奇异值的比值, 如果最大奇异值与最小奇异值比较接近, 则谱条件数就比较小. 那么, 谱条件数与特征值有没有直接关系? 我们知道, 如果矩阵 AA 对称正定, 则其谱条件数就是最大特征值与最小特征值的比值. 但是,对于一般的矩阵, 则没有这个性质. 如下面的矩阵:

A=[112121121]Rn×n,A = \left[ \begin{array}{c c c c} 1 & - \frac {1}{2} & \dots & - \frac {1}{2} \\ & 1 & \ddots & \vdots \\ & & \ddots & - \frac {1}{2} \\ & & & 1 \end{array} \right] \in \mathbb {R} ^ {n \times n},

即对角线全是1, 严格上三角部分全是 12-\frac{1}{2} , 其它为0. 易知 AA 的所有特征值都是1, 但其谱条件数却随着矩阵规模的增长而快速变大, 见下表:

另外,通过直接计算可知

A1=[1121.521.5221.5n221121.5211.522121.521121].A ^ {- 1} = \left[ \begin{array}{c c c c c c} 1 & \frac {1}{2} & \frac {1 . 5}{2} & \frac {1 . 5 ^ {2}}{2} & \dots & \frac {1 . 5 ^ {n - 2}}{2} \\ & 1 & \frac {1}{2} & \frac {1 . 5}{2} & \ddots & \vdots \\ & & 1 & \ddots & \ddots & \frac {1 . 5 ^ {2}}{2} \\ & & & \ddots & \frac {1}{2} & \frac {1 . 5}{2} \\ & & & & 1 & \frac {1}{2} \\ & & & & & 1 \end{array} \right].

因此, κ1(A)\kappa_{1}(A)κ(A)\kappa_{\infty}(A) 都是矩阵维数 nn 的幂函数

3.4.3 奇异值更多性质

下面是关于矩阵 AA 的一个低秩逼近

定理3.15 设 A=UnΣVA = U_{n}\Sigma V^{*}ACm×nA \in \mathbb{C}^{m \times n} 的细奇异值分解, 令 Ak=i=1kσiuiviA_{k} = \sum_{i=1}^{k} \sigma_{i}u_{i}v_{i}^{*} , 则 AkA_{k}

minBCm×n,rank(B)=kAB2(3.14)\min _ {B \in \mathbb {C} ^ {m \times n}, \operatorname {r a n k} (B) = k} \| A - B \| _ {2} \tag {3.14}

的一个解,且

AAk2=σk+1.\left\| A - A _ {k} \right\| _ {2} = \sigma_ {k + 1}.

此时,我们称 AkA_{k}AA 的一个秩 kk 逼近

(板书)

证明. 设 BCm×nB \in \mathbb{C}^{m \times n}rank(B)=k\operatorname{rank}(B) = k , 则

dim(Ker(B))+dim(span{Vk+1})=(nk)+(k+1)=n+1>n,\dim (\operatorname {K e r} (B)) + \dim (\operatorname {s p a n} \{V _ {k + 1} \}) = (n - k) + (k + 1) = n + 1 > n,

其中 Vk+1=[v1,v2,,vk+1]Cn×(k+1)V_{k + 1} = [v_1,v_2,\ldots ,v_{k + 1}]\in \mathbb{C}^{n\times (k + 1)} .所以 Ker(B)\operatorname {Ker}(B)span{Vk+1}\operatorname {span}\{V_{k + 1}\} 有非零公共元素.令 00\neq xKer(B)span{Vk+1}x\in \mathrm{Ker}(B)\cap \mathrm{span}\{V_{k + 1}\} ,不失一般性,我们假设 x2=1\| x\| _2 = 1 .故存在 yCk+1y\in \mathbb{C}^{k + 1} 满足 y2=1\| y\| _2 = 1 使得

x=Vk+1y.x = V_{k + 1}y. 于是

AB22(AB)x22=Ax22=UnΣVVk+1y22=Σ[Ik+10]y22=Σ[y0]22=i=1k+1σi2yi2σk+12,\begin{array}{l} \| A - B \| _ {2} ^ {2} \geq \| (A - B) x \| _ {2} ^ {2} = \| A x \| _ {2} ^ {2} = \| U _ {n} \Sigma V ^ {*} V _ {k + 1} y \| _ {2} ^ {2} \\ = \left\| \Sigma \left[ \begin{array}{c} I _ {k + 1} \\ 0 \end{array} \right] y \right\| _ {2} ^ {2} = \left\| \Sigma \left[ \begin{array}{c} y \\ 0 \end{array} \right] \right\| _ {2} ^ {2} = \sum_ {i = 1} ^ {k + 1} \sigma_ {i} ^ {2} | y _ {i} | ^ {2} \geq \sigma_ {k + 1} ^ {2}, \\ \end{array}

其中 Σ1=diag(σ1,σ2,,σk+1)\Sigma_{1} = \mathrm{diag}(\sigma_{1},\sigma_{2},\ldots ,\sigma_{k + 1}) .这里我们利用了性质 i=1k+1yi2=y22=1.\sum_{i = 1}^{k + 1}|y_i|^2 = \| y\| _2^2 = 1. 所以

minBCm×n,rank(B)=kAB2σk+1.\min_{\substack{B\in \mathbb{C}^{m\times n},\operatorname {rank}(B) = k}}\| A - B\|_{2}\geq \sigma_{k + 1}.

rank(Ak)=k\operatorname{rank}(A_k) = k , 且

AAk2=i=k+1nσiuivi2=Un[00σk+1σn]V2=σk+1,\| A - A _ {k} \| _ {2} = \left\| \sum_ {i = k + 1} ^ {n} \sigma_ {i} u _ {i} v _ {i} ^ {*} \right\| _ {2} = \left\| U _ {n} \left[ \begin{array}{c c c c c c} 0 & & & & & \\ & \ddots & & & & \\ & & 0 & & & \\ & & & \sigma_ {k + 1} & & \\ & & & & \ddots & \\ & & & & & \sigma_ {n} \end{array} \right] V ^ {*} \right\| _ {2} = \sigma_ {k + 1},

所以

minBCm×n,rank(B)=kAB2=σk+1,\min _ {B \in \mathbb {C} ^ {m \times n}, \operatorname {r a n k} (B) = k} \| A - B \| _ {2} = \sigma_ {k + 1},

AkA_{k} 是问题(3.13)的一个解

定理中的(3.13)式也可以改写为

minBCm×n,rank(B)kAB2(3.15)\min _ {B \in \mathbb {C} ^ {m \times n}, \operatorname {r a n k} (B) \leq k} \| A - B \| _ {2} \tag {3.15}

对于 Frobenius 范数, 我们有类似的结论, 见习题 3.13.

定理3.16 (Weyl) [119] 设 A,BCm×nA, B \in \mathbb{C}^{m \times n} ( mnm \geq n ), 且 rank(B)=k\operatorname{rank}(B) = k , 则有

maxxKer(B),x2=1Ax2σk+1(A),(3.16)\max _ {x \in \operatorname {K e r} (B), \| x \| _ {2} = 1} \| A x \| _ {2} \geq \sigma_ {k + 1} (A), \tag {3.16}

minxKer(B),x2=1Ax2σnk(A).(3.17)\min _ {x \in \operatorname {K e r} (B), \| x \| _ {2} = 1} \| A x \| _ {2} \leq \sigma_ {n - k} (A). \tag {3.17}

因此,

σ1(AB)σk+1(A),σn(AB)σnk(A).(3.18)\sigma_ {1} (A - B) \geq \sigma_ {k + 1} (A), \quad \sigma_ {n} (A - B) \leq \sigma_ {n - k} (A). \tag {3.18}

(板书)

证明. 不等式 (3.15) 的证明与定理 3.15 的证明类似, 不再赘述.

下面证明结论 (3.16). 令 V~k+1=[vnk,vnk+1,,vn]Cn×(k+1)\tilde{V}_{k+1} = [v_{n-k}, v_{n-k+1}, \ldots, v_n] \in \mathbb{C}^{n \times (k+1)} . 类似地, 存在向量 yCk+1y \in \mathbb{C}^{k+1} 满足 y2=1\|y\|_2 = 1 , 使得 x=V~k+1yKer(B)x = \tilde{V}_{k+1}y \in \operatorname{Ker}(B) . 于是

Ax22=UnΣVV~k+1y22=[0Ik+1]y22=Σ[0y]22=i=1k+1σnk1+i2(A)yi2σnk2(A).\| A x \| _ {2} ^ {2} = \| U _ {n} \Sigma V ^ {*} \tilde {V} _ {k + 1} y \| _ {2} ^ {2} = \left\| \left[ \begin{array}{c} 0 \\ I _ {k + 1} \end{array} \right] y \right\| _ {2} ^ {2} = \left\| \Sigma \left[ \begin{array}{c} 0 \\ y \end{array} \right] \right\| _ {2} ^ {2} = \sum_ {i = 1} ^ {k + 1} \sigma_ {n - k - 1 + i} ^ {2} (A) | y _ {i} | ^ {2} \leq \sigma_ {n - k} ^ {2} (A).

所以

minxKer(B),x2=1Ax2σnk(A).\min _ {x \in \operatorname {K e r} (B), \| x \| _ {2} = 1} \| A x \| _ {2} \leq \sigma_ {n - k} (A).

根据定理3.13中的性质5,以及(3.15)和(3.16)可得

σ1(AB)=maxx2=1(AB)x2maxxKer(B),x2=1(AB)x2=maxxKer(B),x2=1Ax2σk+1(A),\begin{array}{l} \sigma_ {1} (A - B) = \max _ {\| x \| _ {2} = 1} \| (A - B) x \| _ {2} \\ \geq \max _ {x \in \operatorname {K e r} (B), \| x \| _ {2} = 1} \| (A - B) x \| _ {2} = \max _ {x \in \operatorname {K e r} (B), \| x \| _ {2} = 1} \| A x \| _ {2} \geq \sigma_ {k + 1} (A), \\ \end{array}
σn(AB)minx2=1(AB)x2minxKer(B),x2=1(AB)x2=minxKer(B),x2=1Ax2σnk(A).\begin{array}{l} \sigma_ {n} (A - B) \leq \min _ {\| x \| _ {2} = 1} \| (A - B) x \| _ {2} \\ \leq \min _ {x \in \operatorname {K e r} (B), \| x \| _ {2} = 1} \| (A - B) x \| _ {2} = \min _ {x \in \operatorname {K e r} (B), \| x \| _ {2} = 1} \| A x \| _ {2} \leq \sigma_ {n - k} (A). \\ \end{array}

因此不等式 (3.17) 成立

不等式(3.15)和(3.16)也可以写为:设 L\mathcal{L}Cn\mathbb{C}^n 中的任意一个 nkn - k 维子空间,则有

maxxL,x2=1Ax2σk+1(A),minxL,x2=1Ax2σnk(A).\max _ {x \in \mathcal {L}, \| x \| _ {2} = 1} \| A x \| _ {2} \geq \sigma_ {k + 1} (A), \quad \min _ {x \in \mathcal {L}, \| x \| _ {2} = 1} \| A x \| _ {2} \leq \sigma_ {n - k} (A).

特别地, 当 k=0k = 0L=Rn\mathcal{L} = \mathbb{R}^n , 因此

maxx2=1Ax2σ1(A),minx2=1Ax2σn(A).\max _ {\| x \| _ {2} = 1} \| A x \| _ {2} \geq \sigma_ {1} (A), \quad \min _ {\| x \| _ {2} = 1} \| A x \| _ {2} \leq \sigma_ {n} (A).

事实上, 我们有 (第一个等式就是 2-范数的定义, 第二个等式留作自行练习)

maxx2=1Ax2=σ1(A),minx2=1Ax2=σn(A).\max _ {\| x \| _ {2} = 1} \| A x \| _ {2} = \sigma_ {1} (A), \quad \min _ {\| x \| _ {2} = 1} \| A x \| _ {2} = \sigma_ {n} (A).

从该结论可知, 如果通过对 AA 进行秩 k(1kn2)k (1 \leq k \leq \frac{n}{2}) 修正来改善其谱条件数, 则最佳情况是降低到 σnk(A)σk+1\frac{\sigma_{n-k}(A)}{\sigma_{k+1}} .

定理3.17 [119] 设 A,BCm×nA, B \in \mathbb{C}^{m \times n} ( mnm \geq n ), 则有

σi+j1(A+B)σi(A)+σj(B),i=1,2,,n,j=1,2,,ni+1.(3.19)\sigma_ {i + j - 1} (A + B) \leq \sigma_ {i} (A) + \sigma_ {j} (B), \quad i = 1, 2, \dots , n, j = 1, 2, \dots , n - i + 1. \tag {3.19}

(板书)

证明. 首先证明 i=j=1i = j = 1 时结论成立. 事实上, 我们有

σ1(A+B)=A+B2A2+B2=σ1(A)+σ1(B).\sigma_ {1} (A + B) = \| A + B \| _ {2} \leq \| A \| _ {2} + \| B \| _ {2} = \sigma_ {1} (A) + \sigma_ {1} (B).

Ai1A_{i-1}Bj1B_{j-1} 分别是 AABB 的秩 i1i-1 和秩 j1j-1 逼近, 则由定理3.15可知

AAi12=σi(A),BBj12=σj(B).\| A - A _ {i - 1} \| _ {2} = \sigma_ {i} (A), \quad \| B - B _ {j - 1} \| _ {2} = \sigma_ {j} (B).

所以

σi(A)+σj(B)=AAi12+BBj12A+B(Ai1+Bj1)2.\sigma_ {i} (A) + \sigma_ {j} (B) = \| A - A _ {i - 1} \| _ {2} + \| B - B _ {j - 1} \| _ {2} \geq \| A + B - (A _ {i - 1} + B _ {j - 1}) \| _ {2}.

rank(Ai1+Bj1)i+j2\operatorname{rank}(A_{i-1} + B_{j-1}) \leq i + j - 2 , 所以由不等式 (3.17) 可知

σi(A)+σj(B)A+B(Ai1+Bj1)2σi+j1(A+B).\sigma_ {i} (A) + \sigma_ {j} (B) \geq \| A + B - (A _ {i - 1} + B _ {j - 1}) \| _ {2} \geq \sigma_ {i + j - 1} (A + B).

A=XY,B=YA = X - Y, B = Y ,则该结论也可以描述为

σi(XY)σi+j1(X)σj(Y),i=1,2,,n,j=1,2,,ni+1.\sigma_ {i} (X - Y) \geq \sigma_ {i + j - 1} (X) - \sigma_ {j} (Y), \quad i = 1, 2, \ldots , n, j = 1, 2, \ldots , n - i + 1.

根据定理3.16, 我们可以得到矩阵奇异值的最小最大定理

定理3.18 设 σ1σ2σn0\sigma_1 \geq \sigma_2 \geq \dots \geq \sigma_n \geq 0 是矩阵 ACm×nA \in \mathbb{C}^{m \times n} ( mnm \geq n ) 的奇异值, 则

σk(A)=mindim(L)=nk+1maxxL,x2=1Ax2,k=1,2,,n\sigma_ {k} (A) = \min _ {\dim (\mathcal {L}) = n - k + 1} \max _ {x \in \mathcal {L}, \| x \| _ {2} = 1} \| A x \| _ {2}, \quad k = 1, 2, \dots , n

σk(A)=maxdim(L)=kminxL,x2=1Ax2,k=1,2,,n,\sigma_{k}(A) = \max_{\dim (\mathcal{L}) = k}\min_{x\in \mathcal{L}, \| x\|_{2} = 1}\| Ax\|_{2},\quad k = 1,2,\ldots ,n,

其中 L\mathcal{L} 表示 Cn\mathbb{C}^n 的一个子空间

(留作练习)

引理3.19(交错不等式)[74]设 ACm×nA\in \mathbb{C}^{m\times n}ArA_{r} 是由 AA 去除 rr 列 (或 rr 行)后得到的子矩阵,则

σi(A)σi(Ar)σi+r(A),i=1,2,,min{m,n}.\sigma_ {i} (A) \geq \sigma_ {i} \left(A _ {r}\right) \geq \sigma_ {i + r} (A), \quad i = 1, 2, \dots , \min \{m, n \}.

这里, 当下标 kk 大于矩阵的维数时, 我们令 σk=0\sigma_{k} = 0 . 进一步, 如果 BC(mr)×(ns)B \in \mathbb{C}^{(m - r) \times (n - s)}AA 的子矩阵, 则

σi(A)σi(B)σi+r+s(A).\sigma_ {i} (A) \geq \sigma_ {i} (B) \geq \sigma_ {i + r + s} (A).

(留作课外自习)

证明. 只需证明 r=1r = 1 的情形, 其余情形可以通过递推实现

假设 A1A_{1} 是由 AA 中去除第 kk 列后的子矩阵. 令 ekCne_k \in \mathbb{C}^n 为单位矩阵的第 kk 列, 即第 kk 个元素是 1, 其它均为 0. 由定理 3.18 可知

σk(A)=mindim(L)=nk+1maxxL,x2=1Ax2mindim(L)=nk+1maxxL,x2=1,xekAx2.\sigma_ {k} (A) = \min _ {\dim (\mathcal {L}) = n - k + 1} \max _ {x \in \mathcal {L}, \| x \| _ {2} = 1} \| A x \| _ {2} \geq \min _ {\dim (\mathcal {L}) = n - k + 1} \max _ {x \in \mathcal {L}, \| x \| _ {2} = 1, x \perp e _ {k}} \| A x \| _ {2}.

如果 xekx \perp e_k , 则 xk=0x_k = 0 , 所以

Ax=A1x~,A x = A _ {1} \tilde {x},

其中 x~=[x1,,xk1,xk+1,,xn]Cn1\tilde{x} = [x_1,\ldots ,x_{k - 1},x_{k + 1},\ldots ,x_n]^{\top}\in \mathbb{C}^{n - 1} 故条件“ xLx\in \mathcal{L}x2=1\| x\| _2 = 1xekx\bot e_k ”就等价于 xL~,x~2=1"x\in \tilde{\mathcal{L}},\|\tilde{x}\|_2 = 1" ,其中 L~Cn1\tilde{\mathcal{L}}\subset \mathbb{C}^{n - 1} 是由 L\mathcal{L} 中的向量去除第 kk 个分量后组成的集合.因此 dim(L~)=dim(L)=nk+1\dim (\tilde{\mathcal{L}}) = \dim (\mathcal{L}) = n - k + 1dim(L~)=dim(L)1=nk.\dim (\tilde{\mathcal{L}}) = \dim (\mathcal{L}) - 1 = n - k. 于是

mindim(L)=nk+1maxxL,x2=1,xekAx2mindim(L~)=nk+1dim(L~)=nkmaxx~L~,x~2=1A1x~2.\min _ {\dim (\mathcal {L}) = n - k + 1} \max _ {x \in \mathcal {L}, \| x \| _ {2} = 1, x \perp e _ {k}} \| A x \| _ {2} \geq \min _ {\dim (\tilde {\mathcal {L}}) = n - k + 1 \text {或} \dim (\tilde {\mathcal {L}}) = n - k} \max _ {\tilde {x} \in \tilde {\mathcal {L}}, \| \tilde {x} \| _ {2} = 1} \| A _ {1} \tilde {x} \| _ {2}.

mindim(L~)=nk+1maxx~L~,x~2=1A1x~2mindim(L~)=nkmaxx~L~,x~2=1A1x~2,\min _ {\dim (\tilde {\mathcal {L}}) = n - k + 1} \max _ {\tilde {x} \in \tilde {\mathcal {L}}, \| \tilde {x} \| _ {2} = 1} \| A _ {1} \tilde {x} \| _ {2} \geq \min _ {\dim (\tilde {\mathcal {L}}) = n - k} \max _ {\tilde {x} \in \tilde {\mathcal {L}}, \| \tilde {x} \| _ {2} = 1} \| A _ {1} \tilde {x} \| _ {2},

A1Cm×(n1)A_{1}\in \mathbb{C}^{m\times (n - 1)} ,所以

σk(A)mindim(L)=nk+1maxxL,x2=1,xekAx2mindim(L~)=(n1)k+1maxx~L~,x~2=1A1x~2=σk(A1).\begin{array}{l} \sigma_ {k} (A) \geq \min _ {\dim (\mathcal {L}) = n - k + 1} \max _ {x \in \mathcal {L}, \| x \| _ {2} = 1, x \perp e _ {k}} \| A x \| _ {2} \\ \geq \min_{\dim (\tilde{\mathcal{L}}) = (n - 1) - k + 1}\max_{\tilde{x}\in \tilde{\mathcal{L}},\| \tilde{x}\|_{2} = 1}\| A_{1}\tilde{x}\|_{2} \\ = \sigma_ {k} (A _ {1}). \\ \end{array}

同理可得

σk+1(A)=maxdim(L)=k+1minxL,x2=1Ax2maxdim(L)=k+1minxL,x2=1,xekAx2maxdim(L~)=kminx~L~,x~2=1A1x~2=σk(A1).\begin{array}{l} \sigma_ {k + 1} (A) = \max _ {\dim (\mathcal {L}) = k + 1} \min _ {x \in \mathcal {L}, \| x \| _ {2} = 1} \| A x \| _ {2} \\ \leq \max _ {\dim (\mathcal {L}) = k + 1} \min _ {x \in \mathcal {L}, \| x \| _ {2} = 1, x \perp e _ {k}} \| A x \| _ {2} \\ \leq \max _ {\dim (\tilde {\mathcal {L}}) = k} \min _ {\tilde {x} \in \tilde {\mathcal {L}}, \| \tilde {x} \| _ {2} = 1} \| A _ {1} \tilde {x} \| _ {2} \\ = \sigma_ {k} (A _ {1}). \\ \end{array}

我们注意到, 在前面的证明中, 并没有要求 mnm \geq n . 因此对于 m<nm < n 的情形, 上面的结论仍然成立.

如果 A1A_{1} 是由 AA 中去除第 kk 行后的子矩阵. 由于 AA^{*}AA 具有相同的奇异值, 因此只需将上面的讨论运用到 AA^{*} 上即可. \square

引理3.20 [74] 设 ACn×n,1kn,A \in \mathbb{C}^{n \times n}, 1 \leq k \leq n, 则对任意的单位列正交矩阵 UkCn×kU_k \in \mathbb{C}^{n \times k}VkCn×kV_k \in \mathbb{C}^{n \times k} , 有

σi(UkAVk)σi(A),i=1,2,,k.(3.20)\sigma_ {i} \left(U _ {k} ^ {*} A V _ {k}\right) \leq \sigma_ {i} (A), \quad i = 1, 2, \dots , k. \tag {3.20}

因此,

det(UkAVk)σ1(A)σ2(A)σk(A).(3.21)\left| \det \left(U _ {k} ^ {*} A V _ {k}\right) \right| \leq \sigma_ {1} (A) \sigma_ {2} (A) \dots \sigma_ {k} (A). \tag {3.21}

(板书)

证明. 我们将 UkU_{k}VkV_{k} 都扩充成酉矩阵 UCm×mU \in \mathbb{C}^{m \times m}VCn×nV \in \mathbb{C}^{n \times n} , 则 UkAVkU_{k}^{*}AV_{k}UAVU^{*}AV 的子矩阵. 由引理3.19可知,

σi(UkAVk)σi(UAV).\sigma_ {i} \left(U _ {k} ^ {*} A V _ {k}\right) \leq \sigma_ {i} \left(U ^ {*} A V\right).

UAVU^{*}AVAA 有相同的奇异值, 故结论 (3.19) 成立

利用定理3.14中的性质1,即可得到不等式(3.20).

定理3.21 (Weyl不等式, 1949) [74] 设 ACn×nA \in \mathbb{C}^{n \times n} , 其奇异值和特征值分别为 σ1σ2σn0\sigma_1 \geq \sigma_2 \geq \dots \geq \sigma_n \geq 0λ1λ2λn|\lambda_1| \geq |\lambda_2| \geq \dots \geq |\lambda_n| , 则

λ1λ2λkσ1σ2σk,k=1,2,,n\left| \lambda_ {1} \lambda_ {2} \dots \lambda_ {k} \right| \leq \sigma_ {1} \sigma_ {2} \dots \sigma_ {k}, \quad k = 1, 2, \dots , n

且当 k=nk = n 时,等号成立.

(板书)

证明. 设 A=URUA = U R U^{*}AA 的 Schur 分解, 其中 UCn×nU \in \mathbb{C}^{n \times n} 是酉矩阵, RCn×nR \in \mathbb{C}^{n \times n} 是上三角矩阵, 且其对角线元素依次为 λ1,λ2,,λn\lambda_{1}, \lambda_{2}, \ldots, \lambda_{n} . 取 UU 的前 kk 列组成一个单位列正交矩阵 UkU_{k} , 则 RkUkAUkR_{k} \triangleq U_{k}^{*} A U_{k}UAU=RU^{*} A U = Rkk 阶顺序主子矩阵, 即 RkR_{k} 也是上三角矩阵, 且其对角线元素依次为 λ1,λ2,,λk\lambda_{1}, \lambda_{2}, \ldots, \lambda_{k} . 所以由引理 3.20 可得

λ1λ2λk=det(Rk)=det(UkAUk)σ1σ2σk.| \lambda_ {1} \lambda_ {2} \dots \lambda_ {k} | = | \det (R _ {k}) | = | \det (U _ {k} ^ {*} A U _ {k}) | \leq \sigma_ {1} \sigma_ {2} \dots \sigma_ {k}.

k=nk = n 时,

λ1λ2λn=det(A)=σ1σ2σn.\left| \lambda_ {1} \lambda_ {2} \dots \lambda_ {n} \right| = | \det (A) | = \sigma_ {1} \sigma_ {2} \dots \sigma_ {n}.

3.4.4 奇异值扰动分析

下面的定理是关于奇异值的一个扰动性质.

定理3.22 [119] 设 A,ECm×n(mn)A, E \in \mathbb{C}^{m \times n} (m \geq n) , 则

σi(A+E)σi(A)E2,i=1,2,,n.\left| \sigma_ {i} (A + E) - \sigma_ {i} (A) \right| \leq \| E \| _ {2}, \quad i = 1, 2, \dots , n.

(留作练习, 可直接利用不等式 (3.18))