5.2_Rayleigh商迭代法

5.2 Rayleigh商迭代法

在反迭代算法中, x(k)x^{(k)} 的Rayleigh商是特征值 λk\lambda_{k} 的近似, 因此我们可以把它作为位移, 于是就得到下面的Rayleigh商迭代法.

算法 5.4. Rayleigh 商迭代算法 (RQI, Rayleigh Quotient Iterations)
1: Given an initial guess x(0)x^{(0)} with x(0)2=1\| x^{(0)}\|_2 = 1
2: compute the Rayleigh quotient ρ0=(x(0),Ax(0))\rho_0 = (x^{(0)}, Ax^{(0)})
3: set k=1k = 1
4: while not converge do
5: σ=ρk1\sigma = \rho_{k-1}
6: y(k)=(AσI)1x(k1)y^{(k)} = (A - \sigma I)^{-1} x^{(k-1)}
7: x(k)=y(k)/y(k)2x^{(k)} = y^{(k)} / \| y^{(k)}\|_2
8: ρk=(x(k),Ax(k))\rho_k = (x^{(k)}, Ax^{(k)})
9: k=k+1k = k + 1
10: end while

关于Rayleigh商迭代的收敛性, 我们有下面的结论

定理5.5如果特征值是单重的,则当误差足够小时,Rayleigh商迭代法中每步迭代所得的正确数字的位数增至三倍,即Rayleigh商迭代是局部三次收敛的. (板书)

证明. 设 A=QΛQTA = Q\Lambda Q^{\mathsf{T}} ,令 x^(k)=QTx(k)\hat{x}^{(k)} = Q^{\mathsf{T}}x^{(k)} ,则在Rayleigh商迭代算法中

ρk=(x(k))TAx(k)=(x^(k))TQTAQx^(k)=(x^(k))TΛx^(k).\rho_ {k} = (x ^ {(k)}) ^ {\mathsf {T}} A x ^ {(k)} = (\hat {x} ^ {(k)}) ^ {\mathsf {T}} Q ^ {\mathsf {T}} A Q \hat {x} ^ {(k)} = (\hat {x} ^ {(k)}) ^ {\mathsf {T}} \Lambda \hat {x} ^ {(k)}.

y^(k)=QTy(k)\hat{y}^{(k)} = Q^{\mathsf{T}}y^{(k)} ,则

y^(k)=QT(Aρk1I)1x(k)=(QTAQρk1I)1x^(k1)=(Λρk1I)1x^(k1),\hat {y} ^ {(k)} = Q ^ {\mathsf {T}} (A - \rho_ {k - 1} I) ^ {- 1} x ^ {(k)} = (Q ^ {\mathsf {T}} A Q - \rho_ {k - 1} I) ^ {- 1} \hat {x} ^ {(k - 1)} = (\Lambda - \rho_ {k - 1} I) ^ {- 1} \hat {x} ^ {(k - 1)},

即,“以初始向量 x(0)x^{(0)}AA 做Rayleigh商迭代”等价于“以初始向量 x^(0)\hat{x}^{(0)}Λ\Lambda 做Rayleigh商迭代”,即它们有相同的收敛性.因此,不失一般性,我们可以假定 A=ΛA = \Lambda 为对角阵,此时 AA 的特征向量为 ei,i=1,2,,n.e_i,i = 1,2,\ldots ,n.

我们假定 x(k)x^{(k)} 收敛到 e1e_1 . 令 dk=x(k)e1d_k = x^{(k)} - e_1 , 则 dk20\| d_k \|_2 \to 0 . 为了证明算法具有局部三次收敛, 我们需要证明: 当 εk=dk2\varepsilon_k = \| d_k \|_2 充分小时, 有 εk+1=dk+12=x(k+1)e12=O(εk3)\varepsilon_{k+1} = \| d_{k+1} \|_2 = \| x^{(k+1)} - e_1 \|_2 = \mathcal{O}(\varepsilon_k^3) .

我们注意到

1=(x(k))Tx(k)=(e1+dk)T(e1+dk)=1+2dk(1)+dkTdk=1+2dk(1)+εk2,1 = (x ^ {(k)}) ^ {\mathsf {T}} x ^ {(k)} = (e _ {1} + d _ {k}) ^ {\mathsf {T}} (e _ {1} + d _ {k}) = 1 + 2 d _ {k} (1) + d _ {k} ^ {\mathsf {T}} d _ {k} = 1 + 2 d _ {k} (1) + \varepsilon_ {k} ^ {2},

其中 dk(1)d_k(1) 表示 dkd_k 的第一个元素. 故 dk(1)=εk2/2d_k(1) = -\varepsilon_k^2 / 2 . 所以

ρk=(x(k))TΛx(k)=(e1+dk)TΛ(e1+dk)=ekTΛe1+2e1TΛdk+dkTΛdkλ1η,\rho_ {k} = (x ^ {(k)}) ^ {\mathsf {T}} \Lambda x ^ {(k)} = (e _ {1} + d _ {k}) ^ {\mathsf {T}} \Lambda (e _ {1} + d _ {k}) = e _ {k} ^ {\mathsf {T}} \Lambda e _ {1} + 2 e _ {1} ^ {\mathsf {T}} \Lambda d _ {k} + d _ {k} ^ {\mathsf {T}} \Lambda d _ {k} \triangleq \lambda_ {1} - \eta ,

其中 η=(2e1Λdk+dkΛdk)=2λ1dk(1)dkΛdk=λ1εk2dkΛdk.\eta = -(2e_1^\top \Lambda d_k + d_k^\top \Lambda d_k) = -2\lambda_1d_k(1) - d_k^\top \Lambda d_k = \lambda_1\varepsilon_k^2 -d_k^\top \Lambda d_k. 于是

ηλ1εk2+Λ2dk222Λ2εk2.| \eta | \leq | \lambda_ {1} | \varepsilon_ {k} ^ {2} + \| \Lambda \| _ {2} \cdot \| d _ {k} \| _ {2} ^ {2} \leq 2 \| \Lambda \| _ {2} \varepsilon_ {k} ^ {2}.

由Rayleigh商算法5.4可知

y(k+1)=(ΛρkI)1x(k)=[x(k)(1)λ1ρk,x(k)(2)λ2ρk,,x(k)(n)λnρk]T=[1+dk(1)λ1ρk,dk(2)λ2ρk,,dk(n)λnρk]T=[1εk2/2η,dk(2)λ2λ1+η,,dk(n)λnλ1+η]T=1εk2/2η[1,dk(2)η(1εk2/2)(λ2λ1+η),,dk(n)η(1εk2/2)(λnλ1+η)]1εk2/2η(e1+d^k+1).\begin{array}{l} y ^ {(k + 1)} = (\Lambda - \rho_ {k} I) ^ {- 1} x ^ {(k)} \\ = \left[ \frac {x ^ {(k)} (1)}{\lambda_ {1} - \rho_ {k}}, \frac {x ^ {(k)} (2)}{\lambda_ {2} - \rho_ {k}}, \dots , \frac {x ^ {(k)} (n)}{\lambda_ {n} - \rho_ {k}} \right] ^ {\mathsf {T}} \\ = \left[ \frac {1 + d _ {k} (1)}{\lambda_ {1} - \rho_ {k}}, \frac {d _ {k} (2)}{\lambda_ {2} - \rho_ {k}}, \dots , \frac {d _ {k} (n)}{\lambda_ {n} - \rho_ {k}} \right] ^ {\mathsf {T}} \\ = \left[ \frac {1 - \varepsilon_ {k} ^ {2} / 2}{\eta}, \frac {d _ {k} (2)}{\lambda_ {2} - \lambda_ {1} + \eta}, \dots , \frac {d _ {k} (n)}{\lambda_ {n} - \lambda_ {1} + \eta} \right] ^ {\mathsf {T}} \\ = \frac {1 - \varepsilon_ {k} ^ {2} / 2}{\eta} \left[ 1, \frac {d _ {k} (2) \eta}{(1 - \varepsilon_ {k} ^ {2} / 2) (\lambda_ {2} - \lambda_ {1} + \eta)}, \ldots , \frac {d _ {k} (n) \eta}{(1 - \varepsilon_ {k} ^ {2} / 2) (\lambda_ {n} - \lambda_ {1} + \eta)} \right] ^ {\intercal} \\ \triangleq \frac {1 - \varepsilon_ {k} ^ {2} / 2}{\eta} \cdot (e _ {1} + \hat {d} _ {k + 1}). \\ \end{array}

其中

d^k+1=[0,dk(2)η(1εk2/2)(λ2λ1+η),,dk(n)η(1εk2/2)(λnλ1+η)]T.\hat {d} _ {k + 1} = \left[ 0, \frac {d _ {k} (2) \eta}{(1 - \varepsilon_ {k} ^ {2} / 2) (\lambda_ {2} - \lambda_ {1} + \eta)}, \dots , \frac {d _ {k} (n) \eta}{(1 - \varepsilon_ {k} ^ {2} / 2) (\lambda_ {n} - \lambda_ {1} + \eta)} \right] ^ {\mathsf {T}}.

因为 λ1\lambda_{1} 是单重特征值, 所以

gap(λ1,Λ)mini1λiλ1>0,\operatorname {g a p} (\lambda_ {1}, \Lambda) \triangleq \min _ {i \neq 1} | \lambda_ {i} - \lambda_ {1} | > 0,

故对于 i=2,3,,ni = 2,3,\ldots ,n ,当 εk\varepsilon_{k} 足够小时有

λiλ1+ηλiλ1ηgap(λ1,Λ)ηgap(λ1,Λ)2Λ2εk2>0.\left| \lambda_ {i} - \lambda_ {1} + \eta \right| \geq \left| \lambda_ {i} - \lambda_ {1} \right| - \left| \eta \right| \geq \operatorname {g a p} \left(\lambda_ {1}, \Lambda\right) - \left| \eta \right| \geq \operatorname {g a p} \left(\lambda_ {1}, \Lambda\right) - 2 \| \Lambda \| _ {2} \varepsilon_ {k} ^ {2} > 0.

于是我们有

d^k+12dk2η(1εk2/2)(gap(λ1,Λ)η)2Λ2εk3(1εk2/2)(gap(λ1,Λ)η),\left\| \hat {d} _ {k + 1} \right\| _ {2} \leq \frac {\| d _ {k} \| _ {2} | \eta |}{(1 - \varepsilon_ {k} ^ {2} / 2) (\operatorname {g a p} (\lambda_ {1} , \Lambda) - | \eta |)} \leq \frac {2 \| \Lambda \| _ {2} \varepsilon_ {k} ^ {3}}{(1 - \varepsilon_ {k} ^ {2} / 2) (\operatorname {g a p} (\lambda_ {1} , \Lambda) - | \eta |)},

d^k+12=O(εk3)\left\| \hat{d}_{k + 1}\right\| _2 = \mathcal{O}(\varepsilon_k^3)

1d^k+12e1+d^k+121+d^k+12,1 - \left\| \hat {d} _ {k + 1} \right\| _ {2} \leq \left\| e _ {1} + \hat {d} _ {k + 1} \right\| _ {2} \leq 1 + \left\| \hat {d} _ {k + 1} \right\| _ {2},

1e1+d^k+12d^k+12.\left| 1 - \left\| e _ {1} + \hat {d} _ {k + 1} \right\| _ {2} \right| \leq \left\| \hat {d} _ {k + 1} \right\| _ {2}.

由于

x(k+1)=y(k+1)y(k+1)2=e1+d^k+1e1+d^k+12,x ^ {(k + 1)} = \frac {y ^ {(k + 1)}}{\| y ^ {(k + 1)} \| _ {2}} = \frac {e _ {1} + \hat {d} _ {k + 1}}{\left\| e _ {1} + \hat {d} _ {k + 1} \right\| _ {2}},

所以

dk+12=x(k+1)e12=(1e1+d^k+12)e1+d^k+12e1+d^k+121e1+d^k+12+d^k+12e1+d^k+122d^k+12e1+d^k+12.\begin{array}{l} \| d _ {k + 1} \| _ {2} = \| x ^ {(k + 1)} - e _ {1} \| _ {2} = \frac {\left\| \left(1 - \left\| e _ {1} + \hat {d} _ {k + 1} \right\| _ {2}\right) e _ {1} + \hat {d} _ {k + 1} \right\| _ {2}}{\left\| e _ {1} + \hat {d} _ {k + 1} \right\| _ {2}} \\ \leq \frac {\left| 1 - \left\| e _ {1} + \hat {d} _ {k + 1} \right\| _ {2} \right| + \left\| \hat {d} _ {k + 1} \right\| _ {2}}{\left\| e _ {1} + \hat {d} _ {k + 1} \right\| _ {2}} \leq \frac {2 \left\| \hat {d} _ {k + 1} \right\| _ {2}}{\left\| e _ {1} + \hat {d} _ {k + 1} \right\| _ {2}}. \\ \end{array}

d^k+12=O(εk3)\left\| \hat{d}_{k + 1}\right\| _2 = \mathcal{O}(\varepsilon_k^3) ,故 εk+1=dk+12=O(εk3)\varepsilon_{k + 1} = \| d_{k + 1}\| _2 = \mathcal{O}(\varepsilon_k^3)

RQI算法具有局部三次收敛性, 但无法确定收敛到哪个特征向量 (特征值), 因此可以作为其他算法的加速手段, 即先使用其他算法 (比如幂迭代) 计算出所需特征值的近似值, 然后再使用RQI算法加速.

A 实际计算时, 判断 (ρk,x(k))(\rho_{k}, x^{(k)}) 是否收敛可以观察残量 rk=(AρkI)x(k)r_{k} = (A - \rho_{k}I)x^{(k)} 是否趋于零.

下面是关于RQI算法的全局收敛性,可参见文献[100].

定理5.6 在RQI算法中, 设 rk=(AρkI)x(k)r_k = (A - \rho_k I)x^{(k)} , 则有

rk+1rk,\| r _ {k + 1} \| \leq \| r _ {k} \|,

其中等号成立当且仅当 ρk+1=ρk\rho_{k + 1} = \rho_kx(k)x^{(k)}(AρkI)2(A - \rho_k I)^2 的特征向量.