5.7_扰动分析

5.7 扰动分析

ARn×nA \in \mathbb{R}^{n \times n} 是对称矩阵, 则有下面的谱分解

定理5.15 设 ARn×nA \in \mathbb{R}^{n \times n} 是对称矩阵. 则存在一个正交矩阵 QQ 使得

A=QΛQT,A = Q \Lambda Q ^ {\mathsf {T}},

其中 Λ=diag(λ1,λ2,,λn)\Lambda = \mathrm{diag}(\lambda_1, \lambda_2, \ldots, \lambda_n) 是一个实对角矩阵.

这里的 λi\lambda_{i} 就是 AA 的特征值, 我们假设 λ1λ2λn\lambda_{1} \geq \lambda_{2} \geq \dots \geq \lambda_{n} . 令 Q=[q1,q2,,qn]Q = [q_{1}, q_{2}, \ldots, q_{n}] , 则 qiq_{i} 就是 λi\lambda_{i} 对应的单位正交特征向量.

关于对称矩阵特征值问题的扰动理论, 这里只做一些简单介绍, 若要深入了解这方面的信息, 可以参考 [73, 74, 122, 148].

5.7.1 特征值与Rayleigh商

定义5.2设 ARn×nA\in \mathbb{R}^{n\times n} 是对称矩阵,向量 xRnx\in \mathbb{R}^n 非零,则 xx 关于 AA 的Rayleigh商定义为

ρ(x,A)=xAxxx.(5.13)\rho (x, A) = \frac {x ^ {\top} A x}{x ^ {\top} x}. \tag {5.13}

有时简记为 ρ(x)\rho (x)

下面是关于实对称矩阵 Rayleigh 商的一些基本性质:

(1) ρ(αx)=ρ(x),αR,α0;\rho (\alpha x) = \rho (x),\forall \alpha \in \mathbb{R},\alpha \neq 0;
(2) ρ(qi)=λi,i=1,2,,n\rho(q_i) = \lambda_i, i = 1, 2, \ldots, n ;
(3) 设 x=α1q1+α2q2++αnqnx = \alpha_{1}q_{1} + \alpha_{2}q_{2} + \dots +\alpha_{n}q_{n} ,则

ρ(x)=α12λ1+α22λ2++αn2λnα12+α22++αn2;\rho (x) = \frac {\alpha_ {1} ^ {2} \lambda_ {1} + \alpha_ {2} ^ {2} \lambda_ {2} + \cdots + \alpha_ {n} ^ {2} \lambda_ {n}}{\alpha_ {1} ^ {2} + \alpha_ {2} ^ {2} + \cdots + \alpha_ {n} ^ {2}};

(4) λnρ(x)λ1,ρ(x)A2.\lambda_{n}\leq \rho (x)\leq \lambda_{1},|\rho (x)|\leq \| A\|_{2}.

实对称矩阵的特征值与 Rayleigh 商之间的一个重要性质是 Courant-Fischer 极小极大定理.

定理5.16 (Courant-Fischer) 设 ARn×nA \in \mathbb{R}^{n \times n} 是对称矩阵, 其特征值为 λ1λ2λn\lambda_1 \geq \lambda_2 \geq \dots \geq \lambda_n , 则有

λk=maxUSknminxU,x0xTAxxTx=minVSnk+1nmaxxV,x0xTAxxTx,\lambda_{k} = \max_{\mathbb{U}\in \mathbb{S}_{k}^{n}}\min_{x\in \mathbb{U}, x\neq 0}\frac{x^{\mathsf{T}}Ax}{x^{\mathsf{T}}x} = \min_{\mathbb{V}\in \mathbb{S}_{n - k + 1}^{n}}\max_{x\in \mathbb{V}, x\neq 0}\frac{x^{\mathsf{T}}Ax}{x^{\mathsf{T}}x},

其中 Sin\mathbb{S}_i^n 表示 Rn\mathbb{R}^n 中所有 ii 维子空间构成的集合. 当

U=span{q1,,qk},V=span{qk,,qn},x=qk\mathbb {U} = \operatorname {s p a n} \left\{q _ {1}, \dots , q _ {k} \right\}, \quad \mathbb {V} = \operatorname {s p a n} \left\{q _ {k}, \dots , q _ {n} \right\}, \quad x = q _ {k}

时, 上式中的等号成立.

(板书)

证明. 设 USkn\mathbb{U} \in \mathbb{S}_k^nVSnk+1n\mathbb{V} \in \mathbb{S}_{n-k+1}^n 分别为 Rn\mathbb{R}^n 中任意的 kknk+1n-k+1 维子空间. 由于

dimU+dimV=n+1>n,\dim \mathbb {U} + \dim \mathbb {V} = n + 1 > n,

可得

UV{0}.\mathbb {U} \cap \mathbb {V} \neq \{0 \}.

故存在非零向量 x~UV.\tilde{x}\in \mathbb{U}\cap \mathbb{V}. 所以有

minxU,x0ρ(x)ρ(x~)maxxV,x0ρ(x).\min _ {x \in \mathbb {U}, x \neq 0} \rho (x) \leq \rho (\tilde {x}) \leq \max _ {x \in \mathbb {V}, x \neq 0} \rho (x).

U\mathbb{U}V\mathbb{V} 的任意性可知,

maxUSknminxU,x0ρ(x)minVSnk+1nmaxxV,x0ρ(x).(5.14)\max _ {\mathbb {U} \in \mathbb {S} _ {k} ^ {n}} \min _ {x \in \mathbb {U}, x \neq 0} \rho (x) \leq \min _ {\mathbb {V} \in \mathbb {S} _ {n - k + 1} ^ {n}} \max _ {x \in \mathbb {V}, x \neq 0} \rho (x). \tag {5.14}

U=span{q1,,qk}\mathbb{U} = \operatorname{span}\{q_1, \ldots, q_k\} , 则 U\mathbb{U} 中的任意向量都可写成 x=α1q1++αkqkx = \alpha_{1}q_{1} + \dots + \alpha_{k}q_{k} , 此时

ρ(x)=xTAxxTx=α12λ1++αk2λkα12++αk2i=1kαi2λki=1kαi2=λk,\rho (x) = \frac {x ^ {\mathsf {T}} A x}{x ^ {\mathsf {T}} x} = \frac {\alpha_ {1} ^ {2} \lambda_ {1} + \cdots + \alpha_ {k} ^ {2} \lambda_ {k}}{\alpha_ {1} ^ {2} + \cdots + \alpha_ {k} ^ {2}} \geq \frac {\sum_ {i = 1} ^ {k} \alpha_ {i} ^ {2} \lambda_ {k}}{\sum_ {i = 1} ^ {k} \alpha_ {i} ^ {2}} = \lambda_ {k},

maxUSknminxU,x0ρ(x)λk.(5.15)\max _ {\mathbb {U} \in \mathbb {S} _ {k} ^ {n}} \min _ {x \in \mathbb {U}, x \neq 0} \rho (x) \geq \lambda_ {k}. \tag {5.15}

同理, 取 V=span{qk,,qn}\mathbb{V} = \operatorname{span}\{q_k, \ldots, q_n\} , 则 V\mathbb{V} 中的任意向量都可写成 x=αkqk++αnqnx = \alpha_k q_k + \dots + \alpha_n q_n , 此时

ρ(x)=xTAxxTx=αk2λk++αn2λnαk2++αn2i=knαi2λki=knαi2=λk,\rho (x) = \frac {x ^ {\mathsf {T}} A x}{x ^ {\mathsf {T}} x} = \frac {\alpha_ {k} ^ {2} \lambda_ {k} + \cdots + \alpha_ {n} ^ {2} \lambda_ {n}}{\alpha_ {k} ^ {2} + \cdots + \alpha_ {n} ^ {2}} \leq \frac {\sum_ {i = k} ^ {n} \alpha_ {i} ^ {2} \lambda_ {k}}{\sum_ {i = k} ^ {n} \alpha_ {i} ^ {2}} = \lambda_ {k},

minVSnk+1nmaxxV,x0ρ(x)λk.(5.16)\min _ {\mathbb {V} \in \mathbb {S} _ {n - k + 1} ^ {n}} \max _ {x \in \mathbb {V}, x \neq 0} \rho (x) \leq \lambda_ {k}. \tag {5.16}

(\refeq:1)(\ref{eq:1})(\refeq:2)(\ref{eq:2})(\refeq:3)(\ref{eq:3}) 可知,定理结论成立

该结论在复数域中也成立[73].

k=1k = 1k=nk = n 时, 就可以得到下面的定理

定理5.17 (Rayleigh-Ritz) 设 ARn×nA \in \mathbb{R}^{n \times n} 是对称矩阵, 其特征值为 λ1λ2λn\lambda_1 \geq \lambda_2 \geq \dots \geq \lambda_n , 则有

λ1=maxxRn,x0xTAxxTx,λn=minxRn,x0xTAxxTx.\lambda_ {1} = \max _ {x \in \mathbb {R} ^ {n}, x \neq 0} \frac {x ^ {\textsf {T}} A x}{x ^ {\textsf {T}} x}, \quad \lambda_ {n} = \min _ {x \in \mathbb {R} ^ {n}, x \neq 0} \frac {x ^ {\textsf {T}} A x}{x ^ {\textsf {T}} x}.

由极小极大定理, 我们可以得到下面的特征值分隔定理

定理5.18(分隔定理)设 ARn×nA\in \mathbb{R}^{n\times n} 是对称矩阵, B=QTAQB = Q^{\mathrm{T}}AQ ,其中 QRn×(n1)Q\in \mathbb{R}^{n\times (n - 1)} 满足 QTQ=In1Q^{\mathrm{T}}Q = I_{n - 1} .再设 AABB 的特征值分别为

λ1λ2λnλ~1λ~2λ~n1,\lambda_ {1} \geq \lambda_ {2} \geq \dots \geq \lambda_ {n} \quad {\text {和}} \quad \tilde {\lambda} _ {1} \geq \tilde {\lambda} _ {2} \geq \dots \geq \tilde {\lambda} _ {n - 1},

则有

λ1λ~1λ2λ~2λ~n1λn.\lambda_ {1} \geq \tilde {\lambda} _ {1} \geq \lambda_ {2} \geq \tilde {\lambda} _ {2} \dots \geq \tilde {\lambda} _ {n - 1} \geq \lambda_ {n}.

特别地, 在定理 5.18 中, 取 Q=[e1,,ei1,ei+1,,en]Q = [e_1, \ldots, e_{i-1}, e_{i+1}, \ldots, e_n] , 则可以得到下面的结论.

推论5.19设 ARn×nA\in \mathbb{R}^{n\times n} 是对称矩阵, A~\tilde{A}AA 的一个 n1n - 1 阶主子矩阵, AAA~\tilde{A} 的特征值分别为

λ1λ2λnλ~1λ~2λ~n1,\lambda_ {1} \geq \lambda_ {2} \geq \dots \geq \lambda_ {n} \quad {\text {和}} \quad \tilde {\lambda} _ {1} \geq \tilde {\lambda} _ {2} \geq \dots \geq \tilde {\lambda} _ {n - 1},

则有

λ1λ~1λ2λ~2λ~n1λn.\lambda_ {1} \geq \tilde {\lambda} _ {1} \geq \lambda_ {2} \geq \tilde {\lambda} _ {2} \dots \geq \tilde {\lambda} _ {n - 1} \geq \lambda_ {n}.

反复应用上面的推论, 即可得到下面的结论

推论5.20 设 ARn×nA \in \mathbb{R}^{n \times n} 是对称矩阵, A~\tilde{A}AA 的一个 kk 阶主子矩阵 (1kn1)(1 \leq k \leq n-1) , AAA~\tilde{A} 的特征值分别为

λ1λ2λnλ~1λ~2λ~k,\lambda_ {1} \geq \lambda_ {2} \geq \dots \geq \lambda_ {n} \quad {\text {和}} \quad \tilde {\lambda} _ {1} \geq \tilde {\lambda} _ {2} \geq \dots \geq \tilde {\lambda} _ {k},

则有

λiλ~iλnk+i,i=1,2,,k.\lambda_ {i} \geq \tilde {\lambda} _ {i} \geq \lambda_ {n - k + i}, \quad i = 1, 2, \ldots , k.

5.7.2 对称矩阵特征值的扰动分析

ARn×nA \in \mathbb{R}^{n \times n} 是对称矩阵, 扰动矩阵 ERn×nE \in \mathbb{R}^{n \times n} 也是对称矩阵, 下面讨论 A+EA + E 的特征值与 AA 的特征值之间的关系.

由极小极大定理, 我们可以证明下面的性质.

定理5.21 设 ARn×nA \in \mathbb{R}^{n \times n}B=A+ERn×nB = A + E \in \mathbb{R}^{n \times n} 都是对称矩阵, 其特征值分别为

λ1λ2λnλ~1λ~2λ~n.\lambda_ {1} \geq \lambda_ {2} \geq \dots \geq \lambda_ {n} \quad {\text {和}} \quad \tilde {\lambda} _ {1} \geq \tilde {\lambda} _ {2} \geq \dots \geq \tilde {\lambda} _ {n}.

假定 EE 的最大和最小特征值分别为 μ1\mu_{1}μn\mu_{n} , 则有

λi+μ1λ~iλi+μn,i=1,2,,n.\lambda_ {i} + \mu_ {1} \geq \tilde {\lambda} _ {i} \geq \lambda_ {i} + \mu_ {n}, \quad i = 1, 2, \dots , n.

(板书)

证明. 由 Courant-Fischer 定理 5.16 和 Rayleigh-Ritz 定理 5.17 可知

λ~i=minVSni+1nmaxxV,x0xTBxxTx=minVSni+1nmaxxV,x0(xTAxxTx+xTExxTx)minVSni+1nmaxxV,x0(xAxxx+μ1)=minVSni+1nmaxxV,x0xTAxxTx+μ1=λi+μ1.\begin{array}{l} \tilde{\lambda}_{i} = \min_{\mathbb{V}\in \mathbb{S}_{n - i + 1}^{n}}\max_{x\in \mathbb{V}, x\neq 0}\frac{x^{\mathsf{T}}Bx}{x^{\mathsf{T}}x} \\ = \min _ {\mathbb {V} \in \mathbb {S} _ {n - i + 1} ^ {n}} \max _ {x \in \mathbb {V}, x \neq 0} \left(\frac {x ^ {\mathsf {T}} A x}{x ^ {\mathsf {T}} x} + \frac {x ^ {\mathsf {T}} E x}{x ^ {\mathsf {T}} x}\right) \\ \leq \min _ {\mathbb {V} \in \mathbb {S} _ {n - i + 1} ^ {n}} \max _ {x \in \mathbb {V}, x \neq 0} \left(\frac {x ^ {\top} A x}{x ^ {\top} x} + \mu_ {1}\right) \\ = \min _ {\mathbb {V} \in \mathbb {S} _ {n - i + 1} ^ {n}} \max _ {x \in \mathbb {V}, x \neq 0} \frac {x ^ {\mathsf {T}} A x}{x ^ {\mathsf {T}} x} + \mu_ {1} \\ = \lambda_ {i} + \mu_ {1}. \\ \end{array}

同理可得

λ~i=maxUSinminxU,x0xTBxxTx=maxUSinminxU,x0(xTAxxTx+xTExxTx)maxUSinminxU,x0(xAxxx+μn)\begin{array}{l} \tilde {\lambda} _ {i} = \max _ {\mathbb {U} \in \mathbb {S} _ {i} ^ {n}} \min _ {x \in \mathbb {U}, x \neq 0} \frac {x ^ {\mathsf {T}} B x}{x ^ {\mathsf {T}} x} \\ = \max _ {\mathbb {U} \in \mathbb {S} _ {i} ^ {n}} \min _ {x \in \mathbb {U}, x \neq 0} \left(\frac {x ^ {\mathsf {T}} A x}{x ^ {\mathsf {T}} x} + \frac {x ^ {\mathsf {T}} E x}{x ^ {\mathsf {T}} x}\right) \\ \geq \max _ {\mathbb {U} \in \mathbb {S} _ {i} ^ {n}} \min _ {x \in \mathbb {U}, x \neq 0} \left(\frac {x ^ {\top} A x}{x ^ {\top} x} + \mu_ {n}\right) \\ \end{array}
=maxUSinminxU,x0xTAxxTx+μn=λi+μn.\begin{array}{l} = \max _ {\mathbb {U} \in \mathbb {S} _ {i} ^ {n}} \min _ {x \in \mathbb {U}, x \neq 0} \frac {x ^ {\mathsf {T}} A x}{x ^ {\mathsf {T}} x} + \mu_ {n} \\ = \lambda_ {i} + \mu_ {n}. \\ \end{array}

所以定理结论成立.

根据这个定理, 我们立即可以得到下面的 Weyl 定理

定理5.22 (Weyl) 设 ARn×nA \in \mathbb{R}^{n \times n}B=A+ERn×nB = A + E \in \mathbb{R}^{n \times n} 都是对称矩阵, 其特征值分别为 λ1λ2λn\lambda_1 \geq \lambda_2 \geq \dots \geq \lambda_nλ~1λ~2λ~n\tilde{\lambda}_1 \geq \tilde{\lambda}_2 \geq \dots \geq \tilde{\lambda}_n , 则

λ~jλjE2,j=1,2,,n.\left| \tilde {\lambda} _ {j} - \lambda_ {j} \right| \leq \| E \| _ {2}, \quad j = 1, 2, \dots , n.

该定理的结论可以推广到奇异值情形. 我们首先给出下面的引理

引理5.23 设 ARm×nA \in \mathbb{R}^{m \times n} ( mnm \geq n ) 的奇异值分解为 A=UΣVA = U\Sigma V ,其中 U=[u1,,un]Rm×nU = [u_1, \ldots, u_n] \in \mathbb{R}^{m \times n} 为列正交矩阵, V=[v1,,vn]Rn×nV = [v_1, \ldots, v_n] \in \mathbb{R}^{n \times n} 为正交矩阵, Σ=diag(σ1,,σn)\Sigma = \mathrm{diag}(\sigma_1, \ldots, \sigma_n) . 将 UU 扩展成 n×nn \times n 的正交矩阵 [U,U~]=[u1,,un,u~1,,u~mn][U, \tilde{U}] = [u_1, \ldots, u_n, \tilde{u}_1, \ldots, \tilde{u}_{m-n}] ,令

H=[0ATA0]R(m+n)×(m+n),H = \left[ \begin{array}{c c} 0 & A ^ {\mathsf {T}} \\ A & 0 \end{array} \right] \in \mathbb {R} ^ {(m + n) \times (m + n)},

HH 对称, 且特征值为 ±σi\pm \sigma_{i} 和0(其中0至少为 mnm - n 重特征值), 对应的特征向量分别为 22[vi±ui],i=1,2,,n,\frac{\sqrt{2}}{2}\left[ \begin{array}{l}v_{i}\\ \pm u_{i} \end{array} \right],i = 1,2,\ldots ,n,[0u~j],j=1,2,,mn.\left[ \begin{array}{l}0\\ \tilde{u}_j \end{array} \right],j = 1,2,\ldots ,m - n.

(留作课外自习, 直接代入验证即可)

由上面的引理和Weyl定理5.22立即可得

定理5.24 设 A,BRm×nA, B \in \mathbb{R}^{m \times n} ( mnm \geq n ), 它们的奇异值分别为 σ1σ2σn\sigma_1 \geq \sigma_2 \geq \dots \geq \sigma_nσ~1σ~2σ~n\tilde{\sigma}_1 \geq \tilde{\sigma}_2 \geq \dots \geq \tilde{\sigma}_n . 则

σ~jσjBA2,j=1,2,,n.\left| \tilde {\sigma} _ {j} - \sigma_ {j} \right| \leq \| B - A \| _ {2}, \quad j = 1, 2, \dots , n.

最后给出一个基于F-范数的扰动性质[73].

定理5.25 设 A,ECn×nA, E \in \mathbb{C}^{n \times n}AA 是Hermite的, A+EA + E 是正规矩阵. 并设 AA 的特征值满足

λ1λ2λn,\lambda_ {1} \geq \lambda_ {2} \geq \dots \geq \lambda_ {n},

A+EA + E 的特征值 λ~1,λ~2,,λ~n\tilde{\lambda}_1, \tilde{\lambda}_2, \ldots, \tilde{\lambda}_n 满足

Re(λ~1)Re(λ~2)Re(λ~n).\operatorname {R e} \left(\tilde {\lambda} _ {1}\right) \geq \operatorname {R e} \left(\tilde {\lambda} _ {2}\right) \geq \dots \geq \operatorname {R e} \left(\tilde {\lambda} _ {n}\right).

i=1nλ~iλi2EF2.\sum_ {i = 1} ^ {n} \left| \tilde {\lambda} _ {i} - \lambda_ {i} \right| ^ {2} \leq \| E \| _ {F} ^ {2}.

5.7.3 对称矩阵特征向量的扰动

定义5.3 设 ARn×nA \in \mathbb{R}^{n \times n} 的特征值为 λ1λ2λn\lambda_1 \geq \lambda_2 \geq \dots \geq \lambda_n , 则 λi\lambda_i 与其余特征值之间的间隙 (gap) 定义为

gap(λi,A)=minjiλjλi.\operatorname {gap}(\lambda_{i},A) = \min_{j\neq i}|\lambda_{j} - \lambda_{i}|.

有时简记为 gap(λi)\mathrm{gap}(\lambda_i)

特征向量的敏感性依赖于其对应的特征值的 gap, 一般来说, gap 越小, 特征向量越敏感.

例5.2 设

A=[1+g1],E=[0εε0],(0<ε<g)A = \left[ \begin{array}{c c} 1 + g & \\ & 1 \end{array} \right], \quad E = \left[ \begin{array}{c c} 0 & \varepsilon \\ \varepsilon & 0 \end{array} \right], \quad (0 < \varepsilon < g)

AA 的特征值为 λ1=1+g,λ2=1\lambda_1 = 1 + g, \lambda_2 = 1 , 对应的单位特征向量为 q1=e1,q2=e2q_1 = e_1, q_2 = e_2 . 当 ε\varepsilon 充分小时, A+EA + E 的特征值为 λ^1,2=1+(g±g2+4ε2)/2\hat{\lambda}_{1,2} = 1 + (g \pm \sqrt{g^2 + 4\varepsilon^2}) / 2 , 对应的单位特征向量为

q^1=β1[11+4ε2/g21]=β1[1(1+2ε2/g2)24(ε/g)41]β1[1(1+2ε2/g2)12ε/g]=11+ε2/g2[1ε/g],\begin{array}{l} \hat {q} _ {1} = \beta_ {1} \cdot \left[ \frac {1}{\sqrt {1 + 4 \varepsilon^ {2} / g ^ {2}} - 1} \right] = \beta_ {1} \cdot \left[ \frac {1}{\sqrt {(1 + 2 \varepsilon^ {2} / g ^ {2}) ^ {2} - 4 (\varepsilon / g) ^ {4}} - 1} \right] \\ \approx \beta_ {1} \cdot \left[ \frac {1}{\frac {(1 + 2 \varepsilon^ {2} / g ^ {2}) - 1}{2 \varepsilon / g}} \right] \\ = \frac {1}{\sqrt {1 + \varepsilon^ {2} / g ^ {2}}} \left[ \begin{array}{c} 1 \\ \varepsilon / g \end{array} \right], \\ \end{array}
q^2=β2[11+4ε2/g212ε/g]11+ε2/g2[ε/g1],\hat {q} _ {2} = \beta_ {2} \cdot \left[ \frac {1}{\frac {- \sqrt {1 + 4 \varepsilon^ {2} / g ^ {2}} - 1}{2 \varepsilon / g}} \right] \approx \frac {1}{\sqrt {1 + \varepsilon^ {2} / g ^ {2}}} \left[ \begin{array}{c} - \varepsilon / g \\ 1 \end{array} \right],

其中 β1,β2\beta_{1},\beta_{2} 为规范化因子.故特征向量的扰动约为 ε/g\varepsilon /g ,与特征值的间隙 gap(λi,A)=g\mathrm{gap}(\lambda_i,A) = g 成反比

定理5.26 设 A=QΛQTA = Q\Lambda Q^{\mathsf{T}}A+E=Q~Λ~Q~TA + E = \tilde{Q}\tilde{\Lambda}\tilde{Q}^{\mathsf{T}} 分别为对称矩阵 ARn×nA\in \mathbb{R}^{n\times n}A+ERn×nA + E\in \mathbb{R}^{n\times n} 的特征值分解, 其中 Q=[q1,q2,,qn]Q = [q_{1},q_{2},\dots ,q_{n}]Q~=[q~1,q~2,,q~n]\tilde{Q} = [\tilde{q}_1,\tilde{q}_2,\dots ,\tilde{q}_n] 均为正交矩阵, 且 q~i\tilde{q}_iqiq_{i} 对应的扰动特征向量. 用 θi\theta_{i} 表示 qiq_{i}q~i\tilde{q}_i 之间的锐角, 则当 gap(λi,A)>0\mathrm{gap}(\lambda_i,A) > 0

12sin2θiE2gap(λi,A).\frac {1}{2} \sin 2 \theta_ {i} \leq \frac {\| E \| _ {2}}{\operatorname {g a p} (\lambda_ {i} , A)}.

类似地, 当 gap(λ~i,A+E)>0\operatorname{gap}(\tilde{\lambda}_i, A + E) > 0

12sin2θiE2gap(λ~i,A+E).\frac {1}{2} \sin 2 \theta_ {i} \leq \frac {\| E \| _ {2}}{\operatorname {g a p} (\tilde {\lambda} _ {i} , A + E)}.

(留作课外自习)

证明. 如右图所示, 令 d=q~i/cosθiqid = \tilde{q}_i / \cos \theta_i - q_i , 即 q~i=(qi+d)cosθi\tilde{q}_i = (q_i + d) \cos \theta_i . 则

dTqi=0,tanθi=d2,secθi=qi+d2.d ^ {\mathsf {T}} q _ {i} = 0, \quad \tan \theta_ {i} = \| d \| _ {2}, \quad \sec \theta_ {i} = \| q _ {i} + d \| _ {2}.

η=λ~iλi\eta = \tilde{\lambda}_i - \lambda_i ,由 (A+E)q~i=λ~iq~i(A + E)\tilde{q}_i = \tilde{\lambda}_i\tilde{q}_i 可得

(A+E)(qi+d)=(η+λi)(qi+d),(A + E) (q _ {i} + d) = (\eta + \lambda_ {i}) (q _ {i} + d),

Aqi=λiqiAq_{i} = \lambda_{i}q_{i} 代入后整理可得

(ηIE)(qi+d)=(AλiI)d.(\eta I - E) (q _ {i} + d) = (A - \lambda_ {i} I) d.

qi(AλiI)=((AλiI)qi)=0,q_i^\top (A - \lambda_iI) = ((A - \lambda_iI)q_i)^\top = 0, 故用 qiq_{i}^{\top} 左乘上式两边可得

qiT(ηIE)(qi+d)=qiT(AλiI)d=0,(5.17)q _ {i} ^ {\mathsf {T}} (\eta I - E) \left(q _ {i} + d\right) = q _ {i} ^ {\mathsf {T}} (A - \lambda_ {i} I) d = 0, \tag {5.17}

(ηIE)(qi+d)span{qi}=span{q1,,qi1,qi+1,,qn}(\eta I - E)(q_i + d) \in \operatorname{span}\{q_i\}^\perp = \operatorname{span}\{q_1, \ldots, q_{i-1}, q_{i+1}, \ldots, q_n\} . 所以可设 (ηIE)(qi+d)=jiαjqj(\eta I - E)(q_i + d) = \sum_{j \neq i} \alpha_j q_j . 又 qid=0q_i^\top d = 0 , 故可设 d=jiδjqjd = \sum_{j \neq i} \delta_j q_j . 所以

jiαjqj=(ηIE)(qi+d)=(AλiI)d=(AλiI)jiδjqj=jiδj(AλiI)qj=jiδj(λjλi)qj.\begin{array}{l} \sum_ {j \neq i} \alpha_ {j} q _ {j} = (\eta I - E) (q _ {i} + d) = (A - \lambda_ {i} I) d \\ = (A - \lambda_ {i} I) \sum_ {j \neq i} \delta_ {j} q _ {j} = \sum_ {j \neq i} \delta_ {j} (A - \lambda_ {i} I) q _ {j} = \sum_ {j \neq i} \delta_ {j} (\lambda_ {j} - \lambda_ {i}) q _ {j}. \\ \end{array}

由于 q1,q2,,qnq_{1}, q_{2}, \ldots, q_{n} 线性无关,故可得 δj(λjλi)=αj\delta_{j}(\lambda_{j} - \lambda_{i}) = \alpha_{j} . 又 gap(λi,A)>0\operatorname{gap}(\lambda_i, A) > 0 ,即 jij \neq iλjλi\lambda_{j} \neq \lambda_{i} ,所以 δi=αiλjλi\delta_{i} = \frac{\alpha_{i}}{\lambda_{j} - \lambda_{i}} ,因此

d=jiαjλjλiqj.d = \sum_ {j \neq i} \frac {\alpha_ {j}}{\lambda_ {j} - \lambda_ {i}} q _ {j}.

注意到 qid=0q_i^\top d = 0qiqi=1q_i^\top q_i = 1 , 所以由 (\refeq:2)(\ref{eq:2}) 可得

η=qiTE(qi+d).\eta = q _ {i} ^ {\mathsf {T}} E (q _ {i} + d).

(ηIE)(qi+d)=(qi+d)ηE(qi+d)=(qi+d)qiE(qi+d)E(qi+d)=((qi+d)qiTI)E(qi+d).\begin{array}{l} (\eta I - E) (q _ {i} + d) = (q _ {i} + d) \eta - E (q _ {i} + d) \\ = (q _ {i} + d) q _ {i} ^ {\top} E (q _ {i} + d) - E (q _ {i} + d) \\ = \left(\left(q _ {i} + d\right) q _ {i} ^ {\mathsf {T}} - I\right) E \left(q _ {i} + d\right). \\ \end{array}

由习题5.5可知 (qi+d)qiI2=qi+d2\| (q_i + d)q_i^\top -I\| _2 = \| q_i + d\| _2 ,故

tanθi=d2=(jiαjλjλi2)1/2(jiαjgap(λi,A)2)1/2=1gap(λi,A)(jiαj2)1/2=1gap(λi,A)jiαjqj2=1gap(λi,A)(ηIE)(qi+d)2\begin{array}{l} \tan \theta_ {i} = \| d \| _ {2} = \left(\sum_ {j \neq i} \left| \frac {\alpha_ {j}}{\lambda_ {j} - \lambda_ {i}} \right| ^ {2}\right) ^ {1 / 2} \\ \leq \left(\sum_ {j \neq i} \left| \frac {\alpha_ {j}}{\operatorname {g a p} (\lambda_ {i} , A)} \right| ^ {2}\right) ^ {1 / 2} \\ = \frac {1}{\operatorname {g a p} (\lambda_ {i} , A)} \left(\sum_ {j \neq i} \alpha_ {j} ^ {2}\right) ^ {1 / 2} \\ = \frac {1}{\operatorname {g a p} (\lambda_ {i} , A)} \left\| \sum_ {j \neq i} \alpha_ {j} q _ {j} \right\| _ {2} \\ = \frac {1}{\operatorname {g a p} (\lambda_ {i} , A)} \| (\eta I - E) (q _ {i} + d) \| _ {2} \\ \end{array}
1gap(λi,A)(qi+d)qiI2E2(qi+d)2=1gap(λi,A)(qi+d)22E2=1gap(λi,A)1cos2θiE2.\begin{array}{l} \leq \frac {1}{\operatorname {g a p} (\lambda_ {i} , A)} \| (q _ {i} + d) q _ {i} ^ {\top} - I \| _ {2} \cdot \| E \| _ {2} \cdot \| (q _ {i} + d) \| _ {2} \\ = \frac {1}{\operatorname {g a p} (\lambda_ {i} , A)} \| (q _ {i} + d) \| _ {2} ^ {2} \cdot \| E \| _ {2} \\ = \frac {1}{\operatorname {g a p} \left(\lambda_ {i} , A\right)} \cdot \frac {1}{\cos^ {2} \theta_ {i}} \| E \| _ {2}. \\ \end{array}

12sin2θi=sinθicosθi=tanθicosθi21gap(λi,A)E2.\frac {1}{2} \sin 2 \theta_ {i} = \sin \theta_ {i} \cos \theta_ {i} = \tan \theta_ {i} \cos \theta_ {i} ^ {2} \leq \frac {1}{\operatorname* {g a p} (\lambda_ {i} , A)} \| E \| _ {2}.

A+EA + E 看作原矩阵, (A+E)E(A + E) - E 看作是扰动矩阵, 则可证明第二个结论.

θi1\theta_{i}\ll 1 时, 12sin2θiθisinθi;\frac{1}{2}\sin 2\theta_{i}\approx \theta_{i}\approx \sin \theta_{i};
E212gap(λi,A)\| E\| _2\geq \frac{1}{2}\mathrm{gap}(\lambda_i,A) ,则定理中给出的上界就失去了实际意义;
在该定理中, 没有对特征值进行排序;
在实际计算中, 我们通常所知道的是 gap(λ~i,A+E)\operatorname{gap}(\tilde{\lambda}_i, A + E) .

5.7.4 Rayleigh商逼近

定理5.27 设对称矩阵 ARn×nA \in \mathbb{R}^{n \times n} 的特征值为 λ1,λ2,,λn\lambda_1, \lambda_2, \ldots, \lambda_n

(1) 若 xRnx \in \mathbb{R}^n 是单位向量, βR\beta \in \mathbb{R} , 则

min1inλiβAxβx2;(5.18)\min _ {1 \leq i \leq n} | \lambda_ {i} - \beta | \leq \| A x - \beta x \| _ {2}; \tag {5.18}

(2) 对于给定的非零向量 xRnx \in \mathbb{R}^n , 当 β=ρ(x)\beta = \rho(x) 时, Axβx2\|Ax - \beta x\|_2 达到最小, 即

minβRAxβx2=Axρ(x)x2;(5.19)\min _ {\beta \in \mathbb {R}} \| A x - \beta x \| _ {2} = \| A x - \rho (x) x \| _ {2}; \tag {5.19}

(3) 令 r=Axρ(x)xr = Ax - \rho(x)x , 其中 xRnx \in \mathbb{R}^n 为单位向量. 设 λi\lambda_i 是距离 ρ(x)\rho(x) 最近的特征值, gap=minjiλjρ(x)\mathrm{gap}' = \min_{j \neq i} |\lambda_j - \rho(x)| , θ\thetaxxqiq_i 之间的锐角, 其中 qiq_iλi\lambda_i 对应的单位特征向量, 则

sinθr2gapλiρ(x)r22gap.(5.20)\sin \theta \leq \frac {\| r \| _ {2}}{\mathrm {g a p} ^ {\prime}} \quad \text {且} \quad | \lambda_ {i} - \rho (x) | \leq \frac {\| r \| _ {2} ^ {2}}{\mathrm {g a p} ^ {\prime}}. \tag {5.20}

(留作课外自习)

证明. (1) 若 β\betaAA 的特征值, 则结论显然成立

β\beta 不是 AA 的特征值, 则 AβIA - \beta I 非奇异, 故

1=x2=(AβI)1(AβI)x2(AβI)12(AβI)x2.(5.21)1 = \| x \| _ {2} = \left\| (A - \beta I) ^ {- 1} (A - \beta I) x \right\| _ {2} \leq \left\| (A - \beta I) ^ {- 1} \right\| _ {2} \cdot \left\| (A - \beta I) x \right\| _ {2}. \tag {5.21}

由于 AβIA - \beta I 对称,且特征值为 λiβ\lambda_{i} - \beta ,故

(AβI)12=1min1inλiβ.\| (A - \beta I)^{-1}\|_{2} = \frac{1}{\min_{1\leq i\leq n}|\lambda_{i} - \beta|}.

代入 (\refeq:1)(\ref{eq:1}) 即可知结论成立

(2) 由于

xT(Axρ(x)x)=xTAxxTAxxTxxTx=0,x ^ {\mathsf {T}} (A x - \rho (x) x) = x ^ {\mathsf {T}} A x - \frac {x ^ {\mathsf {T}} A x}{x ^ {\mathsf {T}} x} x ^ {\mathsf {T}} x = 0,

x(Axρ(x)x)x \perp (Ax - \rho(x)x) . 所以

Axβx22=(Aρ(x))x+(ρ(x)β)x22=Axρ(x)x22+(ρ(x)β)x22Axρ(x)x22,\begin{array}{l} \left\| A x - \beta x \right\| _ {2} ^ {2} = \left\| (A - \rho (x)) x + (\rho (x) - \beta) x \right\| _ {2} ^ {2} \\ = \| A x - \rho (x) x \| _ {2} ^ {2} + \left\| (\rho (x) - \beta) x \right\| _ {2} ^ {2} \\ \geq \| A x - \rho (x) x \| _ {2} ^ {2}, \\ \end{array}

所以当 β=ρ(x)\beta = \rho (x) 时, Axβx2\| Ax - \beta x\| _2 达到最小

(3) 略

由 (5.11) 可知, 在幂迭代和反迭代中可以使用残量 Axλ~x2<tol\|Ax - \tilde{\lambda}x\|_2 < tol 作为停机准则, 这里 λ~\tilde{\lambda} 是迭代过程中计算得到的近似特征值. 等式 (5.12) 则解释了为什么用 Rayleigh 商来近似特征值.
不等式(5.13)表明 λiρ(x)|\lambda_i - \rho (x)| 的值与残量范数 r2\| r\| _2 的平方成正比,这个结论是Rayleigh商迭代局部三次收敛的基础.

5.7.5 相对扰动分析*

这里主要讨论 AAXTAXX^{\mathsf{T}}AX 的特征值和特征向量之间的扰动关系, 其中 XX 非奇异且满足 XTXI2=ε\| X^{\mathsf{T}}X - I\|_{2} = \varepsilon . 这是因为在计算特征向量时, 由于舍入误差的原因, 最后得到的正交矩阵 QQ 会带有误差, 从而失去正交性.

定理5.28 (相对Weyl定理) 设对称矩阵 AAXTAXX^{\mathsf{T}}AX 的特征值分别为 λ1λ2λn\lambda_1\geq \lambda_2\geq \dots \geq \lambda_nλ~1λ~2λ~n,\tilde{\lambda}_1\geq \tilde{\lambda}_2\geq \dots \geq \tilde{\lambda}_n,ε=XTXI2\varepsilon = \| X^{\mathsf{T}}X - I\|_{2} ,则

λ~iλiελiλ~iλiλiε(ifλi0).| \tilde {\lambda} _ {i} - \lambda_ {i} | \leq \varepsilon | \lambda_ {i} | \quad \text {或} \quad \frac {| \tilde {\lambda} _ {i} - \lambda_ {i} |}{| \lambda_ {i} |} \leq \varepsilon \quad (\mathrm {i f} \lambda_ {i} \neq 0).

(留作课外自习)

证明. 因为 AλiIA - \lambda_{i}I 的第 ii 个特征值为0, 故由Sylvester惯性定理5.11可知

XT(AλiI)X=(XTAXλiI)+λi(IXTX)X ^ {\mathsf {T}} (A - \lambda_ {i} I) X = (X ^ {\mathsf {T}} A X - \lambda_ {i} I) + \lambda_ {i} (I - X ^ {\mathsf {T}} X)

的第 ii 个特征值也为0.由Weyl定理5.22可知

(λ~iλi)0λi(IXTX)2=ελi,\left| \left(\tilde {\lambda} _ {i} - \lambda_ {i}\right) - 0 \right\lVert \leq \left\| \lambda_ {i} (I - X ^ {\mathsf {T}} X) \right\| _ {2} = \varepsilon | \lambda_ {i} |,

即定理结论成立

XX 正交时, ε=0\varepsilon = 0 , 故 XTAXX^{\mathsf{T}}AXAA 有相同的特征值. 当 XX 几乎正交时, ε\varepsilon 很小, 此时 XTAXX^{\mathsf{T}}AXAA 的特征值几乎相同.

推论5.29设 GGYTGXY^{\mathsf{T}}GX 的奇异值分别为 σ1σ2σn\sigma_{1}\geq \sigma_{2}\geq \dots \geq \sigma_{n}σ~1σ~2σ~n,\tilde{\sigma}_1\geq \tilde{\sigma}_2\geq \dots \geq \tilde{\sigma}_n,

ε=max{XTXI2,YTYI2},\varepsilon = \max \left\{\| X ^ {\mathsf {T}} X - I \| _ {2}, \| Y ^ {\mathsf {T}} Y - I \| _ {2} \right\}, \text {则}
σ~iσiεσiσ~iσiσiε(ifσi0).| \tilde {\sigma} _ {i} - \sigma_ {i} | \leq \varepsilon | \sigma_ {i} | \quad \text {或} \quad \frac {| \tilde {\sigma} _ {i} - \sigma_ {i} |}{| \sigma_ {i} |} \leq \varepsilon \quad (\mathrm {i f} \sigma_ {i} \neq 0).

下面给出特征向量的相对扰动性质

定义5.4设 ARn×nA\in \mathbb{R}^{n\times n} 的特征值为 λ1,λ2,,λn\lambda_1,\lambda_2,\dots ,\lambda_n ,若 λi0\lambda_{i}\neq 0 ,则 λi\lambda_{i} 与其余特征值之间的相对间隙(relative gap)定义为

relgap(λi,A)=minjiλjλiλi.\operatorname {r e l g a p} (\lambda_ {i}, A) = \min _ {j \neq i} \frac {| \lambda_ {j} - \lambda_ {i} |}{| \lambda_ {i} |}.

定理5.30 设 ARn×nA \in \mathbb{R}^{n \times n}XAXRn×nX^{\top}AX \in \mathbb{R}^{n \times n} 的特征值分解分别为 A=QΛQA = Q\Lambda Q^{\top}XAX=Q~Λ~Q~X^{\top}AX = \tilde{Q}\tilde{\Lambda}\tilde{Q}^{\top} , 其中 Q=[q1,q2,,qn]Q = [q_1, q_2, \ldots, q_n]Q~=[q~1,q~2,,q~n]\tilde{Q} = [\tilde{q}_1, \tilde{q}_2, \ldots, \tilde{q}_n] 均为正交矩阵, Λ=diag(λ1,λ2,,λn)\Lambda = \mathrm{diag}(\lambda_1, \lambda_2, \ldots, \lambda_n) , Λ~=diag(λ~1,λ~2,,λ~n)\tilde{\Lambda} = \mathrm{diag}(\tilde{\lambda}_1, \tilde{\lambda}_2, \ldots, \tilde{\lambda}_n)λ1λ2λn\lambda_1 \geq \lambda_2 \geq \dots \geq \lambda_n , λ~1λ~2λ~n\tilde{\lambda}_1 \geq \tilde{\lambda}_2 \geq \dots \geq \tilde{\lambda}_n . 设 θi\theta_i 表示 qiq_iq~i\tilde{q}_i 之间的锐角, 令 ε1=IXTX12\varepsilon_1 = \|I - X^{-T}X^{-1}\|_2 , ε2=XI2\varepsilon_2 = \|X - I\|_2 , 若 ε1<1\varepsilon_1 < 1relgap(λ~i,XAX)>0\operatorname{relgap}(\tilde{\lambda}_i, X^{\top}AX) > 0 , 则

12sin2θiε11ε11relgap(λ~i,XAX)+ε2.\frac {1}{2} \sin 2 \theta_ {i} \leq \frac {\varepsilon_ {1}}{1 - \varepsilon_ {1}} \cdot \frac {1}{\operatorname {r e l g a p} (\tilde {\lambda} _ {i} , X ^ {\top} A X)} + \varepsilon_ {2}.

(留作课外自习)

证明.设 η=λ~iλi,H=Aλ~iI,F=λ~i(IXTX1)\eta = \tilde{\lambda}_i - \lambda_i,H = A - \tilde{\lambda}_iI,F = \tilde{\lambda}_i(I - X^{-T}X^{-1}) ,则 Hqi=Aqiλ~iqi=ηqiHq_{i} = Aq_{i} - \tilde{\lambda}_{i}q_{i} = -\eta q_{i}

H+F=Aλ~iXTX1=XT(XAXλ~iI)X1.H + F = A - \tilde {\lambda} _ {i} X ^ {- T} X ^ {- 1} = X ^ {- T} (X ^ {\top} A X - \tilde {\lambda} _ {i} I) X ^ {- 1}.

(H+F)(Xq~i)=0.(H + F)(X\tilde{q}_i) = 0.Xq~iX\tilde{q}_iH+FH + F 的第 ii 个特征值 λ~i=0\tilde{\lambda}_i = 0 的一个特征向量.设 θ1\theta_{1}qiq_{i}Xq~iX\tilde{q}_i 之间的锐角,由定理5.26可知

12sin2θ1F2gap(λi,H+F)=ε1λ~igap(λi,H+F).(5.22)\frac {1}{2} \sin 2 \theta_ {1} \leq \frac {\| F \| _ {2}}{\operatorname {g a p} \left(\lambda_ {i} , H + F\right)} = \frac {\varepsilon_ {1} | \tilde {\lambda} _ {i} |}{\operatorname {g a p} \left(\lambda_ {i} , H + F\right)}. \tag {5.22}

由于 λ~i=0\tilde{\lambda}_i = 0 ,故 gap(λi,H+F)\mathrm{gap}(\lambda_i,H + F) 即为 H+FH + F 的最小非零特征值的绝对值.又 XT(H+F)X=XTAXλ~iIX^{\mathsf{T}}(H + F)X = X^{\mathsf{T}}AX - \tilde{\lambda}_iI 的特征值为 λ~jλ~i,j=1,2,,n,\tilde{\lambda}_j - \tilde{\lambda}_i,j = 1,2,\ldots ,n,

λ~1λ~iλ~2λ~iλ~nλ~i,\tilde {\lambda} _ {1} - \tilde {\lambda} _ {i} \geq \tilde {\lambda} _ {2} - \tilde {\lambda} _ {i} \geq \dots \geq \tilde {\lambda} _ {n} - \tilde {\lambda} _ {i},

所以由相对Weyl定理5.28可知

λ^j(λ~jλ~i)ε1λ~jλ~i.\left| \hat {\lambda} _ {j} - \left(\tilde {\lambda} _ {j} - \tilde {\lambda} _ {i}\right) \right| \leq \varepsilon_ {1} \left| \tilde {\lambda} _ {j} - \tilde {\lambda} _ {i} \right|.

这里 λ^j\hat{\lambda}_j 表示 H+FH + F 的第 jj 个特征值(按降序排列).因此 λ^j(1ε1)λ~jλ~i|\hat{\lambda}_j|\geq (1 - \varepsilon_1)|\tilde{\lambda}_j - \tilde{\lambda}_i| ,故

gap(λ^i,H+F)(1ε1)gap(λ~i,XAX).\operatorname {g a p} \left(\hat {\lambda} _ {i}, H + F\right) \geq \left(1 - \varepsilon_ {1}\right) \operatorname {g a p} \left(\tilde {\lambda} _ {i}, X ^ {\top} A X\right).

代入 ()(\text{出}) 可得

12sin2θ1ε1λ~i(1ε1)gap(λ~i,XAX)=ε11ε11relgap(λ~i,XAX).\frac {1}{2} \sin 2 \theta_ {1} \leq \frac {\varepsilon_ {1} | \tilde {\lambda} _ {i} |}{(1 - \varepsilon_ {1}) \operatorname {g a p} (\tilde {\lambda} _ {i} , X ^ {\top} A X)} = \frac {\varepsilon_ {1}}{1 - \varepsilon_ {1}} \cdot \frac {1}{\operatorname {r e l g a p} (\tilde {\lambda} _ {i} , X ^ {\top} A X)}.

θ2\theta_{2}Xq~iX\tilde{q}_iq~i\tilde{q}_i 之间的锐角,则由右图可知

sinθ2=q~i2sinθ2Xq~iq~i2XI2q~i2=ε2.\begin{array}{l} \sin \theta_ {2} = \| \tilde {q} _ {i} \| _ {2} \sin \theta_ {2} \leq \| X \tilde {q} _ {i} - \tilde {q} _ {i} \| _ {2} \\ \leq \| X - I \| _ {2} \cdot \| \tilde {q} _ {i} \| _ {2} \\ = \varepsilon_ {2}. \\ \end{array}

θiθ1+θ2\theta_{i}\leq \theta_{1} + \theta_{2} ,故

12sin2θi12sin2θ1+12sin2θ212sin2θ1+sinθ2ε11ε11relgap(λ~i,XAX)+ε2,\begin{array}{l} \frac {1}{2} \sin 2 \theta_ {i} \leq \frac {1}{2} \sin 2 \theta_ {1} + \frac {1}{2} \sin 2 \theta_ {2} \\ \leq \frac {1}{2} \sin 2 \theta_ {1} + \sin \theta_ {2} \\ \leq \frac {\varepsilon_ {1}}{1 - \varepsilon_ {1}} \cdot \frac {1}{\operatorname {r e l g a p} (\tilde {\lambda} _ {i} , X ^ {\top} A X)} + \varepsilon_ {2}, \\ \end{array}

即定理结论成立.