5.2 Rayleigh商迭代法 在反迭代算法中, x ( k ) x^{(k)} x ( k ) 的Rayleigh商是特征值 λ k \lambda_{k} λ k 的近似, 因此我们可以把它作为位移, 于是就得到下面的Rayleigh商迭代法.
算法 5.4. Rayleigh 商迭代算法 (RQI, Rayleigh Quotient Iterations) 1: Given an initial guess x ( 0 ) x^{(0)} x ( 0 ) with ∥ x ( 0 ) ∥ 2 = 1 \| x^{(0)}\|_2 = 1 ∥ x ( 0 ) ∥ 2 = 1 2: compute the Rayleigh quotient ρ 0 = ( x ( 0 ) , A x ( 0 ) ) \rho_0 = (x^{(0)}, Ax^{(0)}) ρ 0 = ( x ( 0 ) , A x ( 0 ) ) 3: set k = 1 k = 1 k = 1 4: while not converge do 5: σ = ρ k − 1 \sigma = \rho_{k-1} σ = ρ k − 1 6: y ( k ) = ( A − σ I ) − 1 x ( k − 1 ) y^{(k)} = (A - \sigma I)^{-1} x^{(k-1)} y ( k ) = ( A − σ I ) − 1 x ( k − 1 ) 7: x ( k ) = y ( k ) / ∥ y ( k ) ∥ 2 x^{(k)} = y^{(k)} / \| y^{(k)}\|_2 x ( k ) = y ( k ) /∥ y ( k ) ∥ 2 8: ρ k = ( x ( k ) , A x ( k ) ) \rho_k = (x^{(k)}, Ax^{(k)}) ρ k = ( x ( k ) , A x ( k ) ) 9: k = k + 1 k = k + 1 k = k + 1 10: end while
关于Rayleigh商迭代的收敛性, 我们有下面的结论
定理5.5如果特征值是单重的,则当误差足够小时,Rayleigh商迭代法中每步迭代所得的正确数字的位数增至三倍,即Rayleigh商迭代是局部三次收敛的. (板书)
证明. 设 A = Q Λ Q T A = Q\Lambda Q^{\mathsf{T}} A = Q Λ Q T ,令 x ^ ( k ) = Q T x ( k ) \hat{x}^{(k)} = Q^{\mathsf{T}}x^{(k)} x ^ ( k ) = Q T x ( k ) ,则在Rayleigh商迭代算法中
ρ k = ( x ( k ) ) T A x ( k ) = ( x ^ ( k ) ) T Q T A Q x ^ ( k ) = ( x ^ ( k ) ) T Λ x ^ ( k ) . \rho_ {k} = (x ^ {(k)}) ^ {\mathsf {T}} A x ^ {(k)} = (\hat {x} ^ {(k)}) ^ {\mathsf {T}} Q ^ {\mathsf {T}} A Q \hat {x} ^ {(k)} = (\hat {x} ^ {(k)}) ^ {\mathsf {T}} \Lambda \hat {x} ^ {(k)}. ρ k = ( x ( k ) ) T A x ( k ) = ( x ^ ( k ) ) T Q T A Q x ^ ( k ) = ( x ^ ( k ) ) T Λ x ^ ( k ) . 令 y ^ ( k ) = Q T y ( k ) \hat{y}^{(k)} = Q^{\mathsf{T}}y^{(k)} y ^ ( k ) = Q T y ( k ) ,则
y ^ ( k ) = Q T ( A − ρ k − 1 I ) − 1 x ( k ) = ( Q T A Q − ρ k − 1 I ) − 1 x ^ ( k − 1 ) = ( Λ − ρ k − 1 I ) − 1 x ^ ( k − 1 ) , \hat {y} ^ {(k)} = Q ^ {\mathsf {T}} (A - \rho_ {k - 1} I) ^ {- 1} x ^ {(k)} = (Q ^ {\mathsf {T}} A Q - \rho_ {k - 1} I) ^ {- 1} \hat {x} ^ {(k - 1)} = (\Lambda - \rho_ {k - 1} I) ^ {- 1} \hat {x} ^ {(k - 1)}, y ^ ( k ) = Q T ( A − ρ k − 1 I ) − 1 x ( k ) = ( Q T A Q − ρ k − 1 I ) − 1 x ^ ( k − 1 ) = ( Λ − ρ k − 1 I ) − 1 x ^ ( k − 1 ) , 即,“以初始向量 x ( 0 ) x^{(0)} x ( 0 ) 对 A A A 做Rayleigh商迭代”等价于“以初始向量 x ^ ( 0 ) \hat{x}^{(0)} x ^ ( 0 ) 对 Λ \Lambda Λ 做Rayleigh商迭代”,即它们有相同的收敛性.因此,不失一般性,我们可以假定 A = Λ A = \Lambda A = Λ 为对角阵,此时 A A A 的特征向量为 e i , i = 1 , 2 , … , n . e_i,i = 1,2,\ldots ,n. e i , i = 1 , 2 , … , n .
我们假定 x ( k ) x^{(k)} x ( k ) 收敛到 e 1 e_1 e 1 . 令 d k = x ( k ) − e 1 d_k = x^{(k)} - e_1 d k = x ( k ) − e 1 , 则 ∥ d k ∥ 2 → 0 \| d_k \|_2 \to 0 ∥ d k ∥ 2 → 0 . 为了证明算法具有局部三次收敛, 我们需要证明: 当 ε k = ∥ d k ∥ 2 \varepsilon_k = \| d_k \|_2 ε k = ∥ d k ∥ 2 充分小时, 有 ε k + 1 = ∥ d k + 1 ∥ 2 = ∥ x ( k + 1 ) − e 1 ∥ 2 = O ( ε k 3 ) \varepsilon_{k+1} = \| d_{k+1} \|_2 = \| x^{(k+1)} - e_1 \|_2 = \mathcal{O}(\varepsilon_k^3) ε k + 1 = ∥ d k + 1 ∥ 2 = ∥ x ( k + 1 ) − e 1 ∥ 2 = O ( ε k 3 ) .
我们注意到
1 = ( x ( k ) ) T x ( k ) = ( e 1 + d k ) T ( e 1 + d k ) = 1 + 2 d k ( 1 ) + d k T d k = 1 + 2 d k ( 1 ) + ε k 2 , 1 = (x ^ {(k)}) ^ {\mathsf {T}} x ^ {(k)} = (e _ {1} + d _ {k}) ^ {\mathsf {T}} (e _ {1} + d _ {k}) = 1 + 2 d _ {k} (1) + d _ {k} ^ {\mathsf {T}} d _ {k} = 1 + 2 d _ {k} (1) + \varepsilon_ {k} ^ {2}, 1 = ( x ( k ) ) T x ( k ) = ( e 1 + d k ) T ( e 1 + d k ) = 1 + 2 d k ( 1 ) + d k T d k = 1 + 2 d k ( 1 ) + ε k 2 , 其中 d k ( 1 ) d_k(1) d k ( 1 ) 表示 d k d_k d k 的第一个元素. 故 d k ( 1 ) = − ε k 2 / 2 d_k(1) = -\varepsilon_k^2 / 2 d k ( 1 ) = − ε k 2 /2 . 所以
ρ k = ( x ( k ) ) T Λ x ( k ) = ( e 1 + d k ) T Λ ( e 1 + d k ) = e k T Λ e 1 + 2 e 1 T Λ d k + d k T Λ d k ≜ λ 1 − η , \rho_ {k} = (x ^ {(k)}) ^ {\mathsf {T}} \Lambda x ^ {(k)} = (e _ {1} + d _ {k}) ^ {\mathsf {T}} \Lambda (e _ {1} + d _ {k}) = e _ {k} ^ {\mathsf {T}} \Lambda e _ {1} + 2 e _ {1} ^ {\mathsf {T}} \Lambda d _ {k} + d _ {k} ^ {\mathsf {T}} \Lambda d _ {k} \triangleq \lambda_ {1} - \eta , ρ k = ( x ( k ) ) T Λ x ( k ) = ( e 1 + d k ) T Λ ( e 1 + d k ) = e k T Λ e 1 + 2 e 1 T Λ d k + d k T Λ d k ≜ λ 1 − η , 其中 η = − ( 2 e 1 ⊤ Λ d k + d k ⊤ Λ d k ) = − 2 λ 1 d k ( 1 ) − d k ⊤ Λ d k = λ 1 ε k 2 − d k ⊤ Λ d k . \eta = -(2e_1^\top \Lambda d_k + d_k^\top \Lambda d_k) = -2\lambda_1d_k(1) - d_k^\top \Lambda d_k = \lambda_1\varepsilon_k^2 -d_k^\top \Lambda d_k. η = − ( 2 e 1 ⊤ Λ d k + d k ⊤ Λ d k ) = − 2 λ 1 d k ( 1 ) − d k ⊤ Λ d k = λ 1 ε k 2 − d k ⊤ Λ d k . 于是
∣ η ∣ ≤ ∣ λ 1 ∣ ε k 2 + ∥ Λ ∥ 2 ⋅ ∥ d k ∥ 2 2 ≤ 2 ∥ Λ ∥ 2 ε k 2 . | \eta | \leq | \lambda_ {1} | \varepsilon_ {k} ^ {2} + \| \Lambda \| _ {2} \cdot \| d _ {k} \| _ {2} ^ {2} \leq 2 \| \Lambda \| _ {2} \varepsilon_ {k} ^ {2}. ∣ η ∣ ≤ ∣ λ 1 ∣ ε k 2 + ∥Λ ∥ 2 ⋅ ∥ d k ∥ 2 2 ≤ 2∥Λ ∥ 2 ε k 2 . 由Rayleigh商算法5.4可知
y ( k + 1 ) = ( Λ − ρ k I ) − 1 x ( k ) = [ x ( k ) ( 1 ) λ 1 − ρ k , x ( k ) ( 2 ) λ 2 − ρ k , … , x ( k ) ( n ) λ n − ρ k ] T = [ 1 + d k ( 1 ) λ 1 − ρ k , d k ( 2 ) λ 2 − ρ k , … , d k ( n ) λ n − ρ k ] T = [ 1 − ε k 2 / 2 η , d k ( 2 ) λ 2 − λ 1 + η , … , d k ( n ) λ n − λ 1 + η ] T = 1 − ε k 2 / 2 η [ 1 , d k ( 2 ) η ( 1 − ε k 2 / 2 ) ( λ 2 − λ 1 + η ) , … , d k ( n ) η ( 1 − ε k 2 / 2 ) ( λ n − λ 1 + η ) ] ⊺ ≜ 1 − ε k 2 / 2 η ⋅ ( e 1 + d ^ k + 1 ) . \begin{array}{l} y ^ {(k + 1)} = (\Lambda - \rho_ {k} I) ^ {- 1} x ^ {(k)} \\ = \left[ \frac {x ^ {(k)} (1)}{\lambda_ {1} - \rho_ {k}}, \frac {x ^ {(k)} (2)}{\lambda_ {2} - \rho_ {k}}, \dots , \frac {x ^ {(k)} (n)}{\lambda_ {n} - \rho_ {k}} \right] ^ {\mathsf {T}} \\ = \left[ \frac {1 + d _ {k} (1)}{\lambda_ {1} - \rho_ {k}}, \frac {d _ {k} (2)}{\lambda_ {2} - \rho_ {k}}, \dots , \frac {d _ {k} (n)}{\lambda_ {n} - \rho_ {k}} \right] ^ {\mathsf {T}} \\ = \left[ \frac {1 - \varepsilon_ {k} ^ {2} / 2}{\eta}, \frac {d _ {k} (2)}{\lambda_ {2} - \lambda_ {1} + \eta}, \dots , \frac {d _ {k} (n)}{\lambda_ {n} - \lambda_ {1} + \eta} \right] ^ {\mathsf {T}} \\ = \frac {1 - \varepsilon_ {k} ^ {2} / 2}{\eta} \left[ 1, \frac {d _ {k} (2) \eta}{(1 - \varepsilon_ {k} ^ {2} / 2) (\lambda_ {2} - \lambda_ {1} + \eta)}, \ldots , \frac {d _ {k} (n) \eta}{(1 - \varepsilon_ {k} ^ {2} / 2) (\lambda_ {n} - \lambda_ {1} + \eta)} \right] ^ {\intercal} \\ \triangleq \frac {1 - \varepsilon_ {k} ^ {2} / 2}{\eta} \cdot (e _ {1} + \hat {d} _ {k + 1}). \\ \end{array} y ( k + 1 ) = ( Λ − ρ k I ) − 1 x ( k ) = [ λ 1 − ρ k x ( k ) ( 1 ) , λ 2 − ρ k x ( k ) ( 2 ) , … , λ n − ρ k x ( k ) ( n ) ] T = [ λ 1 − ρ k 1 + d k ( 1 ) , λ 2 − ρ k d k ( 2 ) , … , λ n − ρ k d k ( n ) ] T = [ η 1 − ε k 2 /2 , λ 2 − λ 1 + η d k ( 2 ) , … , λ n − λ 1 + η d k ( n ) ] T = η 1 − ε k 2 /2 [ 1 , ( 1 − ε k 2 /2 ) ( λ 2 − λ 1 + η ) d k ( 2 ) η , … , ( 1 − ε k 2 /2 ) ( λ n − λ 1 + η ) d k ( n ) η ] ⊺ ≜ η 1 − ε k 2 /2 ⋅ ( e 1 + d ^ k + 1 ) . 其中
d ^ k + 1 = [ 0 , d k ( 2 ) η ( 1 − ε k 2 / 2 ) ( λ 2 − λ 1 + η ) , … , d k ( n ) η ( 1 − ε k 2 / 2 ) ( λ n − λ 1 + η ) ] T . \hat {d} _ {k + 1} = \left[ 0, \frac {d _ {k} (2) \eta}{(1 - \varepsilon_ {k} ^ {2} / 2) (\lambda_ {2} - \lambda_ {1} + \eta)}, \dots , \frac {d _ {k} (n) \eta}{(1 - \varepsilon_ {k} ^ {2} / 2) (\lambda_ {n} - \lambda_ {1} + \eta)} \right] ^ {\mathsf {T}}. d ^ k + 1 = [ 0 , ( 1 − ε k 2 /2 ) ( λ 2 − λ 1 + η ) d k ( 2 ) η , … , ( 1 − ε k 2 /2 ) ( λ n − λ 1 + η ) d k ( n ) η ] T . 因为 λ 1 \lambda_{1} λ 1 是单重特征值, 所以
gap ( λ 1 , Λ ) ≜ min i ≠ 1 ∣ λ i − λ 1 ∣ > 0 , \operatorname {g a p} (\lambda_ {1}, \Lambda) \triangleq \min _ {i \neq 1} | \lambda_ {i} - \lambda_ {1} | > 0, gap ( λ 1 , Λ ) ≜ i = 1 min ∣ λ i − λ 1 ∣ > 0 , 故对于 i = 2 , 3 , … , n i = 2,3,\ldots ,n i = 2 , 3 , … , n ,当 ε k \varepsilon_{k} ε k 足够小时有
∣ λ i − λ 1 + η ∣ ≥ ∣ λ i − λ 1 ∣ − ∣ η ∣ ≥ gap ( λ 1 , Λ ) − ∣ η ∣ ≥ gap ( λ 1 , Λ ) − 2 ∥ Λ ∥ 2 ε k 2 > 0. \left| \lambda_ {i} - \lambda_ {1} + \eta \right| \geq \left| \lambda_ {i} - \lambda_ {1} \right| - \left| \eta \right| \geq \operatorname {g a p} \left(\lambda_ {1}, \Lambda\right) - \left| \eta \right| \geq \operatorname {g a p} \left(\lambda_ {1}, \Lambda\right) - 2 \| \Lambda \| _ {2} \varepsilon_ {k} ^ {2} > 0. ∣ λ i − λ 1 + η ∣ ≥ ∣ λ i − λ 1 ∣ − ∣ η ∣ ≥ gap ( λ 1 , Λ ) − ∣ η ∣ ≥ gap ( λ 1 , Λ ) − 2∥Λ ∥ 2 ε k 2 > 0. 于是我们有
∥ d ^ k + 1 ∥ 2 ≤ ∥ d k ∥ 2 ∣ η ∣ ( 1 − ε k 2 / 2 ) ( gap ( λ 1 , Λ ) − ∣ η ∣ ) ≤ 2 ∥ Λ ∥ 2 ε k 3 ( 1 − ε k 2 / 2 ) ( gap ( λ 1 , Λ ) − ∣ η ∣ ) , \left\| \hat {d} _ {k + 1} \right\| _ {2} \leq \frac {\| d _ {k} \| _ {2} | \eta |}{(1 - \varepsilon_ {k} ^ {2} / 2) (\operatorname {g a p} (\lambda_ {1} , \Lambda) - | \eta |)} \leq \frac {2 \| \Lambda \| _ {2} \varepsilon_ {k} ^ {3}}{(1 - \varepsilon_ {k} ^ {2} / 2) (\operatorname {g a p} (\lambda_ {1} , \Lambda) - | \eta |)}, d ^ k + 1 2 ≤ ( 1 − ε k 2 /2 ) ( gap ( λ 1 , Λ ) − ∣ η ∣ ) ∥ d k ∥ 2 ∣ η ∣ ≤ ( 1 − ε k 2 /2 ) ( gap ( λ 1 , Λ ) − ∣ η ∣ ) 2∥Λ ∥ 2 ε k 3 , 即 ∥ d ^ k + 1 ∥ 2 = O ( ε k 3 ) \left\| \hat{d}_{k + 1}\right\| _2 = \mathcal{O}(\varepsilon_k^3) d ^ k + 1 2 = O ( ε k 3 ) 又
1 − ∥ d ^ k + 1 ∥ 2 ≤ ∥ e 1 + d ^ k + 1 ∥ 2 ≤ 1 + ∥ d ^ k + 1 ∥ 2 , 1 - \left\| \hat {d} _ {k + 1} \right\| _ {2} \leq \left\| e _ {1} + \hat {d} _ {k + 1} \right\| _ {2} \leq 1 + \left\| \hat {d} _ {k + 1} \right\| _ {2}, 1 − d ^ k + 1 2 ≤ e 1 + d ^ k + 1 2 ≤ 1 + d ^ k + 1 2 , 即
∣ 1 − ∥ e 1 + d ^ k + 1 ∥ 2 ∣ ≤ ∥ d ^ k + 1 ∥ 2 . \left| 1 - \left\| e _ {1} + \hat {d} _ {k + 1} \right\| _ {2} \right| \leq \left\| \hat {d} _ {k + 1} \right\| _ {2}. 1 − e 1 + d ^ k + 1 2 ≤ d ^ k + 1 2 . 由于
x ( k + 1 ) = y ( k + 1 ) ∥ y ( k + 1 ) ∥ 2 = e 1 + d ^ k + 1 ∥ e 1 + d ^ k + 1 ∥ 2 , x ^ {(k + 1)} = \frac {y ^ {(k + 1)}}{\| y ^ {(k + 1)} \| _ {2}} = \frac {e _ {1} + \hat {d} _ {k + 1}}{\left\| e _ {1} + \hat {d} _ {k + 1} \right\| _ {2}}, x ( k + 1 ) = ∥ y ( k + 1 ) ∥ 2 y ( k + 1 ) = e 1 + d ^ k + 1 2 e 1 + d ^ k + 1 , 所以
∥ d k + 1 ∥ 2 = ∥ x ( k + 1 ) − e 1 ∥ 2 = ∥ ( 1 − ∥ e 1 + d ^ k + 1 ∥ 2 ) e 1 + d ^ k + 1 ∥ 2 ∥ e 1 + d ^ k + 1 ∥ 2 ≤ ∣ 1 − ∥ e 1 + d ^ k + 1 ∥ 2 ∣ + ∥ d ^ k + 1 ∥ 2 ∥ e 1 + d ^ k + 1 ∥ 2 ≤ 2 ∥ d ^ k + 1 ∥ 2 ∥ e 1 + d ^ k + 1 ∥ 2 . \begin{array}{l} \| d _ {k + 1} \| _ {2} = \| x ^ {(k + 1)} - e _ {1} \| _ {2} = \frac {\left\| \left(1 - \left\| e _ {1} + \hat {d} _ {k + 1} \right\| _ {2}\right) e _ {1} + \hat {d} _ {k + 1} \right\| _ {2}}{\left\| e _ {1} + \hat {d} _ {k + 1} \right\| _ {2}} \\ \leq \frac {\left| 1 - \left\| e _ {1} + \hat {d} _ {k + 1} \right\| _ {2} \right| + \left\| \hat {d} _ {k + 1} \right\| _ {2}}{\left\| e _ {1} + \hat {d} _ {k + 1} \right\| _ {2}} \leq \frac {2 \left\| \hat {d} _ {k + 1} \right\| _ {2}}{\left\| e _ {1} + \hat {d} _ {k + 1} \right\| _ {2}}. \\ \end{array} ∥ d k + 1 ∥ 2 = ∥ x ( k + 1 ) − e 1 ∥ 2 = ∥ e 1 + d ^ k + 1 ∥ 2 ∥ ( 1 − ∥ e 1 + d ^ k + 1 ∥ 2 ) e 1 + d ^ k + 1 ∥ 2 ≤ ∥ e 1 + d ^ k + 1 ∥ 2 ∣ 1 − ∥ e 1 + d ^ k + 1 ∥ 2 ∣ + ∥ d ^ k + 1 ∥ 2 ≤ ∥ e 1 + d ^ k + 1 ∥ 2 2 ∥ d ^ k + 1 ∥ 2 . 又 ∥ d ^ k + 1 ∥ 2 = O ( ε k 3 ) \left\| \hat{d}_{k + 1}\right\| _2 = \mathcal{O}(\varepsilon_k^3) d ^ k + 1 2 = O ( ε k 3 ) ,故 ε k + 1 = ∥ d k + 1 ∥ 2 = O ( ε k 3 ) \varepsilon_{k + 1} = \| d_{k + 1}\| _2 = \mathcal{O}(\varepsilon_k^3) ε k + 1 = ∥ d k + 1 ∥ 2 = O ( ε k 3 )
RQI算法具有局部三次收敛性, 但无法确定收敛到哪个特征向量 (特征值), 因此可以作为其他算法的加速手段, 即先使用其他算法 (比如幂迭代) 计算出所需特征值的近似值, 然后再使用RQI算法加速.
A 实际计算时, 判断 ( ρ k , x ( k ) ) (\rho_{k}, x^{(k)}) ( ρ k , x ( k ) ) 是否收敛可以观察残量 r k = ( A − ρ k I ) x ( k ) r_{k} = (A - \rho_{k}I)x^{(k)} r k = ( A − ρ k I ) x ( k ) 是否趋于零.
下面是关于RQI算法的全局收敛性,可参见文献[100].
定理5.6 在RQI算法中, 设 r k = ( A − ρ k I ) x ( k ) r_k = (A - \rho_k I)x^{(k)} r k = ( A − ρ k I ) x ( k ) , 则有
∥ r k + 1 ∥ ≤ ∥ r k ∥ , \| r _ {k + 1} \| \leq \| r _ {k} \|, ∥ r k + 1 ∥ ≤ ∥ r k ∥ , 其中等号成立当且仅当 ρ k + 1 = ρ k \rho_{k + 1} = \rho_k ρ k + 1 = ρ k 且 x ( k ) x^{(k)} x ( k ) 是 ( A − ρ k I ) 2 (A - \rho_k I)^2 ( A − ρ k I ) 2 的特征向量.