7.3 共轭梯度法 共轭梯度法 (Conjugate Gradient, CG) 是当前求解对称正定线性方程组的首选方法. CG 算法可以从优化角度得出, 也可以从代数角度推导. 我们这里就从代数角度来介绍 CG 算法.
7.3.1 算法基本过程 我们首先给出CG算法的一个性质,然后根据这个性质来推导CG算法.
定理7.5 设 A A A 对称正定, 则
x ( m ) = arg min x ∈ x ( 0 ) + K m ∥ x − x ∗ ∥ A (7.9) x ^ {(m)} = \arg \min _ {x \in x ^ {(0)} + \mathcal {K} _ {m}} \| x - x _ {*} \| _ {A} \tag {7.9} x ( m ) = arg x ∈ x ( 0 ) + K m min ∥ x − x ∗ ∥ A ( 7.9 ) 的充要条件是
x ( m ) ∈ x ( 0 ) + K m 且 b − A x ( m ) ⊥ K m . (7.10) x ^ {(m)} \in x ^ {(0)} + \mathcal {K} _ {m} \quad \text {且} \quad b - A x ^ {(m)} \perp \mathcal {K} _ {m}. \tag {7.10} x ( m ) ∈ x ( 0 ) + K m 且 b − A x ( m ) ⊥ K m . ( 7.10 ) (板书)
证明. 首先证明充分性. 设 x ( m ) x^{(m)} x ( m ) 满足 (7.10). 记 x ~ = x ( m ) − x ( 0 ) \tilde{x} = x^{(m)} - x^{(0)} x ~ = x ( m ) − x ( 0 ) , 则 x ~ ∈ K m \tilde{x} \in \mathcal{K}_m x ~ ∈ K m 且
r 0 − A x ~ ⊥ K m . r _ {0} - A \tilde {x} \perp \mathcal {K} _ {m}. r 0 − A x ~ ⊥ K m . 由正交投影的性质可知, x ~ \tilde{x} x ~ 是最佳逼近问题
min x ∈ K m ∥ x − A − 1 r 0 ∥ A \min _ {x \in \mathcal {K} _ {m}} \| x - A ^ {- 1} r _ {0} \| _ {A} x ∈ K m min ∥ x − A − 1 r 0 ∥ A 的解,即
x ~ = arg min x ∈ K m ∥ x − A − 1 r ( 0 ) ∥ A = arg min x ∈ K m ∥ x − A − 1 ( b − A x ( 0 ) ) ∥ A = arg min x ∈ K m ∥ ( x ( 0 ) + x ) − x ∗ ∥ A . \begin{array}{l} \tilde {x} = \arg \min _ {x \in \mathcal {K} _ {m}} \| x - A ^ {- 1} r ^ {(0)} \| _ {A} \\ = \arg \min _ {x \in \mathcal {K} _ {m}} \| x - A ^ {- 1} (b - A x ^ {(0)}) \| _ {A} \\ = \arg \min _ {x \in \mathcal {K} _ {m}} \| (x ^ {(0)} + x) - x _ {*} \| _ {A}. \\ \end{array} x ~ = arg min x ∈ K m ∥ x − A − 1 r ( 0 ) ∥ A = arg min x ∈ K m ∥ x − A − 1 ( b − A x ( 0 ) ) ∥ A = arg min x ∈ K m ∥ ( x ( 0 ) + x ) − x ∗ ∥ A . 所以, x ( m ) = x ( 0 ) + x ~ x^{(m)} = x^{(0)} + \tilde{x} x ( m ) = x ( 0 ) + x ~ 是最佳逼近问题
min x ∈ x ( 0 ) + K m ∥ x − x ∗ ∥ A \min _ {x \in x ^ {(0)} + \mathcal {K} _ {m}} \| x - x _ {*} \| _ {A} x ∈ x ( 0 ) + K m min ∥ x − x ∗ ∥ A 的解,即结论(7.9)成立
必要性只需利用正交投影的性质即可,见作业7.3
当 A A A 对称正定时, Arnoldi 过程就转化为 Lanczos 过程, 且
A V m = V m + 1 T m + 1 , m = V m T m + β m v m + 1 e m T , A V _ {m} = V _ {m + 1} T _ {m + 1, m} = V _ {m} T _ {m} + \beta_ {m} v _ {m + 1} e _ {m} ^ {\mathsf {T}}, A V m = V m + 1 T m + 1 , m = V m T m + β m v m + 1 e m T , V m T A V m = T m , V _ {m} ^ {\mathsf {T}} A V _ {m} = T _ {m}, V m T A V m = T m , 其中 T m = t r i d i a g ( β i , α i + 1 , β i + 1 ) T_{m} = \mathrm{tridiag}(\beta_{i},\alpha_{i + 1},\beta_{i + 1}) T m = tridiag ( β i , α i + 1 , β i + 1 ) ,见(7.3).由定理7.5可知,此时我们需要在 x ( 0 ) + K m x^{(0)} + \mathcal{K}_m x ( 0 ) + K m 寻找最优近似解 x ( m ) x^{(m)} x ( m ) ,满足
b − A x ( m ) ⊥ K m . (7.11) b - A x ^ {(m)} \perp \mathcal {K} _ {m}. \tag {7.11} b − A x ( m ) ⊥ K m . ( 7.11 ) 根据这个性质, 我们就可以给出 CG 算法
设 x ( m ) = x ( 0 ) + V m z ( m ) x^{(m)} = x^{(0)} + V_mz^{(m)} x ( m ) = x ( 0 ) + V m z ( m ) ,其中 z ( m ) ∈ R m z^{(m)}\in \mathbb{R}^m z ( m ) ∈ R m .由(7.11)可知
0 = V m T ( b − A x ( m ) ) = V m T ( r 0 − A V m z ( m ) ) = V m T ( β v 1 ) − V m T A V m z ( m ) = β e 1 − T m z ( m ) . 0 = V _ {m} ^ {\mathsf {T}} (b - A x ^ {(m)}) = V _ {m} ^ {\mathsf {T}} (r _ {0} - A V _ {m} z ^ {(m)}) = V _ {m} ^ {\mathsf {T}} (\beta v _ {1}) - V _ {m} ^ {\mathsf {T}} A V _ {m} z ^ {(m)} = \beta e _ {1} - T _ {m} z ^ {(m)}. 0 = V m T ( b − A x ( m ) ) = V m T ( r 0 − A V m z ( m ) ) = V m T ( β v 1 ) − V m T A V m z ( m ) = β e 1 − T m z ( m ) . 因此,
z ( m ) = T m − 1 ( β e 1 ) . z ^ {(m)} = T _ {m} ^ {- 1} (\beta e _ {1}). z ( m ) = T m − 1 ( β e 1 ) . 于是可得
x ( m ) = x ( 0 ) + V m z ( m ) = x ( 0 ) + V m T m − 1 ( β e 1 ) . x ^ {(m)} = x ^ {(0)} + V _ {m} z ^ {(m)} = x ^ {(0)} + V _ {m} T _ {m} ^ {- 1} (\beta e _ {1}). x ( m ) = x ( 0 ) + V m z ( m ) = x ( 0 ) + V m T m − 1 ( β e 1 ) . 如果 x ( m ) x^{(m)} x ( m ) 满足精度要求, 则计算结束. 否则令 m ← m + 1 m \gets m + 1 m ← m + 1 , 继续下一步迭代. 这就是 CG 方法的基本过程.
7.3.2 实用迭代格式 下面我们推导CG方法的具体实施过程.由于 A A A 是对称正定的,因此我们可以借助三项递推公式来简化运算.
首先, 根据 T m T_{m} T m 对称正定性, 我们知道 T m T_{m} T m 存在 L D L ⊤ \mathrm{LDL}^{\top} LDL ⊤ 分解, 即 T m = L m D m L m ⊤ T_{m} = L_{m}D_{m}L_{m}^{\top} T m = L m D m L m ⊤ . 于是
x ( m ) = x ( 0 ) + V m z ( m ) = x ( 0 ) + V m T m − 1 ( β e 1 ( m ) ) = x ( 0 ) + ( V m L m − T ) ( β D m − 1 L m − 1 e 1 ( m ) ) , x ^ {(m)} = x ^ {(0)} + V _ {m} z ^ {(m)} = x ^ {(0)} + V _ {m} T _ {m} ^ {- 1} (\beta e _ {1} ^ {(m)}) = x ^ {(0)} + (V _ {m} L _ {m} ^ {- T}) (\beta D _ {m} ^ {- 1} L _ {m} ^ {- 1} e _ {1} ^ {(m)}), x ( m ) = x ( 0 ) + V m z ( m ) = x ( 0 ) + V m T m − 1 ( β e 1 ( m ) ) = x ( 0 ) + ( V m L m − T ) ( β D m − 1 L m − 1 e 1 ( m ) ) , 其中 e 1 ( m ) e_1^{(m)} e 1 ( m ) 表示 m m m 阶单位矩阵的第一列. 如果 x ( m ) x^{(m)} x ( m ) 满足精度要求, 则计算结束. 否则我们需要计算 x ( m + 1 ) x^{(m+1)} x ( m + 1 ) , 即
x ( m + 1 ) = x ( 0 ) + V m + 1 T m + 1 − 1 ( β e 1 ( m + 1 ) ) = x ( 0 ) + ( V m + 1 L m + 1 − T ) ( β D m + 1 − 1 L m + 1 − 1 e 1 ( m + 1 ) ) . x ^ {(m + 1)} = x ^ {(0)} + V _ {m + 1} T _ {m + 1} ^ {- 1} (\beta e _ {1} ^ {(m + 1)}) = x ^ {(0)} + (V _ {m + 1} L _ {m + 1} ^ {- T}) (\beta D _ {m + 1} ^ {- 1} L _ {m + 1} ^ {- 1} e _ {1} ^ {(m + 1)}). x ( m + 1 ) = x ( 0 ) + V m + 1 T m + 1 − 1 ( β e 1 ( m + 1 ) ) = x ( 0 ) + ( V m + 1 L m + 1 − T ) ( β D m + 1 − 1 L m + 1 − 1 e 1 ( m + 1 ) ) . 这里假定 T m + 1 T_{m + 1} T m + 1 的 L D L T \mathrm{LDL}^{\mathsf{T}} LDL T 分解为 T m + 1 = L m + 1 D m + 1 L m + 1 T T_{m + 1} = L_{m + 1}D_{m + 1}L_{m + 1}^{\mathsf{T}} T m + 1 = L m + 1 D m + 1 L m + 1 T . 下面就考虑如何利用递推方式来计算 x ( m + 1 ) x^{(m + 1)} x ( m + 1 ) . 记
P ~ m ≜ V m L m − T = [ p ~ 1 , p ~ 2 , … , p ~ m ] ∈ R n × m , \tilde {P} _ {m} \triangleq V _ {m} L _ {m} ^ {- T} = \left[ \tilde {p} _ {1}, \tilde {p} _ {2}, \dots , \tilde {p} _ {m} \right] \in \mathbb {R} ^ {n \times m}, P ~ m ≜ V m L m − T = [ p ~ 1 , p ~ 2 , … , p ~ m ] ∈ R n × m , y m ≜ β D m − 1 L m − 1 e 1 ( m ) = [ η 1 , … , η m ] ⊺ ∈ R m . y _ {m} \triangleq \beta D _ {m} ^ {- 1} L _ {m} ^ {- 1} e _ {1} ^ {(m)} = [ \eta_ {1}, \dots , \eta_ {m} ] ^ {\intercal} \in \mathbb {R} ^ {m}. y m ≜ β D m − 1 L m − 1 e 1 ( m ) = [ η 1 , … , η m ] ⊺ ∈ R m . 我们首先证明下面的结论
引理7.6 设 P ~ m \tilde{P}_m P ~ m 和 y m y_m y m 由上面的式子所定义, 则下面的递推公式成立:
P ~ m + 1 ≜ V m + 1 L m + 1 − T = [ P ~ m , p ~ m + 1 ] , \tilde {P} _ {m + 1} \triangleq V _ {m + 1} L _ {m + 1} ^ {- T} = [ \tilde {P} _ {m}, \tilde {p} _ {m + 1} ], P ~ m + 1 ≜ V m + 1 L m + 1 − T = [ P ~ m , p ~ m + 1 ] , y m + 1 ≜ β D m + 1 − 1 L m + 1 − 1 e 1 ( m + 1 ) = [ y m T , η m + 1 ] T , m = 1 , 2 , … . y _ {m + 1} \triangleq \beta D _ {m + 1} ^ {- 1} L _ {m + 1} ^ {- 1} e _ {1} ^ {(m + 1)} = [ y _ {m} ^ {\mathsf {T}}, \eta_ {m + 1} ] ^ {\mathsf {T}}, \quad m = 1, 2, \ldots . y m + 1 ≜ β D m + 1 − 1 L m + 1 − 1 e 1 ( m + 1 ) = [ y m T , η m + 1 ] T , m = 1 , 2 , … . (板书)
证明. 设 T m T_{m} T m 的 L D L ⊤ \mathrm{LDL}^{\top} LDL ⊤ 分解为
T m = L m D m L m T = [ 1 l 1 1 ⋱ ⋱ l m − 1 1 ] [ d 1 d 2 ⋱ d m ] [ 1 l 1 1 ⋱ ⋱ l m − 1 1 ] T . T _ {m} = L _ {m} D _ {m} L _ {m} ^ {\mathsf {T}} = \left[ \begin{array}{c c c c} 1 & & & \\ l _ {1} & 1 & & \\ & \ddots & \ddots & \\ & & l _ {m - 1} & 1 \end{array} \right] \left[ \begin{array}{c c c c} d _ {1} & & & \\ & d _ {2} & & \\ & & \ddots & \\ & & & d _ {m} \end{array} \right] \left[ \begin{array}{c c c c} 1 & & & \\ l _ {1} & 1 & & \\ & \ddots & \ddots & \\ & & l _ {m - 1} & 1 \end{array} \right] ^ {\mathsf {T}}. T m = L m D m L m T = 1 l 1 1 ⋱ ⋱ l m − 1 1 d 1 d 2 ⋱ d m 1 l 1 1 ⋱ ⋱ l m − 1 1 T . 由待定系数法可知 d 1 = α 1 d_{1} = \alpha_{1} d 1 = α 1
l i = β i / d i , d i + 1 = α i + 1 − l i β i , i = 1 , 2 , … , m − 1. l _ {i} = \beta_ {i} / d _ {i}, \quad d _ {i + 1} = \alpha_ {i + 1} - l _ {i} \beta_ {i}, \quad i = 1, 2, \dots , m - 1. l i = β i / d i , d i + 1 = α i + 1 − l i β i , i = 1 , 2 , … , m − 1. 记 γ = [ 0 , … , 0 , β m ] T ∈ R m \gamma = [0,\dots ,0,\beta_{m}]^{\mathsf{T}}\in \mathbb{R}^{m} γ = [ 0 , … , 0 , β m ] T ∈ R m ,则 T m + 1 T_{m + 1} T m + 1 的LDL分解为
T m + 1 = [ T m γ m γ m T α m + 1 ] = L m + 1 D m + 1 L m + 1 T = [ 1 l 1 1 ⋱ ⋱ l m − 1 1 l m 1 ] [ d 1 d 2 ⋱ d m d m + 1 ] [ 1 l 1 1 ⋱ ⋱ l m − 1 1 l m 1 ] T , \begin{array}{l} T _ {m + 1} = \left[ \begin{array}{c c} T _ {m} & \gamma_ {m} \\ \gamma_ {m} ^ {\mathsf {T}} & \alpha_ {m + 1} \end{array} \right] = L _ {m + 1} D _ {m + 1} L _ {m + 1} ^ {\mathsf {T}} \\ = \left[ \begin{array}{c c c c c} 1 & & & & \\ l _ {1} & 1 & & & \\ & \ddots & \ddots & & \\ & & l _ {m - 1} & 1 & \\ & & & l _ {m} & 1 \end{array} \right] \left[ \begin{array}{c c c c c} d _ {1} & & & & \\ & d _ {2} & & & \\ & & \ddots & & \\ & & & d _ {m} & \\ & & & & d _ {m + 1} \end{array} \right] \left[ \begin{array}{c c c c c} 1 & & & & \\ l _ {1} & 1 & & & \\ & \ddots & \ddots & & \\ & & l _ {m - 1} & 1 & \\ & & & l _ {m} & 1 \end{array} \right] ^ {\mathsf {T}}, \\ \end{array} T m + 1 = [ T m γ m T γ m α m + 1 ] = L m + 1 D m + 1 L m + 1 T = 1 l 1 1 ⋱ ⋱ l m − 1 1 l m 1 d 1 d 2 ⋱ d m d m + 1 1 l 1 1 ⋱ ⋱ l m − 1 1 l m 1 T , 其中 l m = β m / d m , d m + 1 = α m + 1 − l m β m . l_{m} = \beta_{m} / d_{m},d_{m + 1} = \alpha_{m + 1} - l_{m}\beta_{m}. l m = β m / d m , d m + 1 = α m + 1 − l m β m . 记 γ ~ = [ 0 , … , 0 , l m ] ⊤ ∈ R m \tilde{\gamma} = [0,\dots ,0,l_m]^\top \in \mathbb{R}^m γ ~ = [ 0 , … , 0 , l m ] ⊤ ∈ R m ,则
L m + 1 = [ L m 0 γ ~ T 1 ] a n d L m + 1 − 1 = [ L m − 1 0 − γ ~ T L m − 1 1 ] . L _ {m + 1} = \left[ \begin{array}{c c} L _ {m} & 0 \\ \tilde {\gamma} ^ {\mathsf {T}} & 1 \end{array} \right] \quad \text {a n d} \quad L _ {m + 1} ^ {- 1} = \left[ \begin{array}{c c} L _ {m} ^ {- 1} & 0 \\ - \tilde {\gamma} ^ {\mathsf {T}} L _ {m} ^ {- 1} & 1 \end{array} \right]. L m + 1 = [ L m γ ~ T 0 1 ] a n d L m + 1 − 1 = [ L m − 1 − γ ~ T L m − 1 0 1 ] . 所以有
P ~ m + 1 = V m + 1 L m + 1 − T = [ V m , v m + 1 ] [ L m − T − L m − T γ ~ 0 1 ] = [ V m L m − T , − V m L m − T γ ~ + v m + 1 ] . \tilde {P} _ {m + 1} = V _ {m + 1} L _ {m + 1} ^ {- T} = [ V _ {m}, v _ {m + 1} ] \left[ \begin{array}{c c} L _ {m} ^ {- T} & - L _ {m} ^ {- T} \tilde {\gamma} \\ 0 & 1 \end{array} \right] = [ V _ {m} L _ {m} ^ {- T}, - V _ {m} L _ {m} ^ {- T} \tilde {\gamma} + v _ {m + 1} ]. P ~ m + 1 = V m + 1 L m + 1 − T = [ V m , v m + 1 ] [ L m − T 0 − L m − T γ ~ 1 ] = [ V m L m − T , − V m L m − T γ ~ + v m + 1 ] . 又 V m L m − T γ ~ = P ~ m [ 0 , … , 0 , l m ] T = l m p ~ m V_{m}L_{m}^{-T}\tilde{\gamma} = \tilde{P}_{m}[0,\dots ,0,l_{m}]^{\mathsf{T}} = l_{m}\tilde{p}_{m} V m L m − T γ ~ = P ~ m [ 0 , … , 0 , l m ] T = l m p ~ m ,所以 P ~ m + 1 = [ P ~ m , p ~ m + 1 ] \tilde{P}_{m + 1} = \left[\tilde{P}_m,\tilde{p}_{m + 1}\right] P ~ m + 1 = [ P ~ m , p ~ m + 1 ] ,其中
p ~ m + 1 = v m + 1 − l m p ~ m . (7.12) \tilde {p} _ {m + 1} = v _ {m + 1} - l _ {m} \tilde {p} _ {m}. \tag {7.12} p ~ m + 1 = v m + 1 − l m p ~ m . ( 7.12 ) 另外,
y m + 1 = β D m + 1 − 1 L m + 1 − 1 e 1 ( m + 1 ) = β [ D m − 1 0 0 d m + 1 − 1 ] [ L m − 1 0 − γ ~ T L m − 1 1 ] [ e 1 ( m ) 0 ] = [ β D m − 1 L m − 1 e 1 ( m ) − β d m + 1 − 1 γ ~ L m − 1 e 1 ( m ) ] = [ y m η m + 1 ] , \begin{array}{l} y _ {m + 1} = \beta D _ {m + 1} ^ {- 1} L _ {m + 1} ^ {- 1} e _ {1} ^ {(m + 1)} = \beta \left[ \begin{array}{c c} D _ {m} ^ {- 1} & 0 \\ 0 & d _ {m + 1} ^ {- 1} \end{array} \right] \left[ \begin{array}{c c} L _ {m} ^ {- 1} & 0 \\ - \tilde {\gamma} ^ {\mathsf {T}} L _ {m} ^ {- 1} & 1 \end{array} \right] \left[ \begin{array}{c} e _ {1} ^ {(m)} \\ 0 \end{array} \right] \\ = \left[ \begin{array}{c} \beta D _ {m} ^ {- 1} L _ {m} ^ {- 1} e _ {1} ^ {(m)} \\ - \beta d _ {m + 1} ^ {- 1} \tilde {\gamma} L _ {m} ^ {- 1} e _ {1} ^ {(m)} \end{array} \right] \\ = \left[ \begin{array}{c} y _ {m} \\ \eta_ {m + 1} \end{array} \right], \\ \end{array} y m + 1 = β D m + 1 − 1 L m + 1 − 1 e 1 ( m + 1 ) = β [ D m − 1 0 0 d m + 1 − 1 ] [ L m − 1 − γ ~ T L m − 1 0 1 ] [ e 1 ( m ) 0 ] = [ β D m − 1 L m − 1 e 1 ( m ) − β d m + 1 − 1 γ ~ L m − 1 e 1 ( m ) ] = [ y m η m + 1 ] , 其中 η m + 1 ≜ − β d m + 1 − 1 γ ~ L m − 1 e 1 ( m ) \eta_{m + 1}\triangleq -\beta d_{m + 1}^{-1}\tilde{\gamma} L_m^{-1}e_1^{(m)} η m + 1 ≜ − β d m + 1 − 1 γ ~ L m − 1 e 1 ( m ) .所以结论成立.
有了上面的性质, 我们就可以得到从 x ( m ) x^{(m)} x ( m ) 到 x ( m + 1 ) x^{(m+1)} x ( m + 1 ) 的递推公式
x ( m + 1 ) = P ~ m + 1 y m + 1 = [ P ~ m , p ~ m + 1 ] [ y m η m + 1 ] = x ( m ) + η m + 1 p ~ m + 1 . x ^ {(m + 1)} = \tilde {P} _ {m + 1} y _ {m + 1} = [ \tilde {P} _ {m}, \tilde {p} _ {m + 1} ] \left[ \begin{array}{c} y _ {m} \\ \eta_ {m + 1} \end{array} \right] = x ^ {(m)} + \eta_ {m + 1} \tilde {p} _ {m + 1}. x ( m + 1 ) = P ~ m + 1 y m + 1 = [ P ~ m , p ~ m + 1 ] [ y m η m + 1 ] = x ( m ) + η m + 1 p ~ m + 1 . 为了判断近似解是否满足要求, 我们还需要计算残量. 直接计算可知, 残量满足下面的递推公式:
r m + 1 = b − A x ( m + 1 ) = b − A ( x ( m ) + η m + 1 p ~ m + 1 ) = r m − η m + 1 A p ~ m + 1 . r _ {m + 1} = b - A x ^ {(m + 1)} = b - A \left(x ^ {(m)} + \eta_ {m + 1} \tilde {p} _ {m + 1}\right) = r _ {m} - \eta_ {m + 1} A \tilde {p} _ {m + 1}. r m + 1 = b − A x ( m + 1 ) = b − A ( x ( m ) + η m + 1 p ~ m + 1 ) = r m − η m + 1 A p ~ m + 1 . 另一方面,我们有
r m = b − A x ( m ) = b − A ( x ( 0 ) + V m z ( m ) ) = r 0 − A V m z ( m ) = β v 1 − V m T m z ( m ) − β m v m + 1 e m T z ( m ) = − β m ( e m T z ( m ) ) v m + 1 , \begin{array}{l} r _ {m} = b - A x ^ {(m)} = b - A \left(x ^ {(0)} + V _ {m} z ^ {(m)}\right) \\ = r _ {0} - A V _ {m} z ^ {(m)} \\ = \beta v _ {1} - V _ {m} T _ {m} z ^ {(m)} - \beta_ {m} v _ {m + 1} e _ {m} ^ {\mathsf {T}} z ^ {(m)} \\ = - \beta_ {m} (e _ {m} ^ {\mathsf {T}} z ^ {(m)}) v _ {m + 1}, \\ \end{array} r m = b − A x ( m ) = b − A ( x ( 0 ) + V m z ( m ) ) = r 0 − A V m z ( m ) = β v 1 − V m T m z ( m ) − β m v m + 1 e m T z ( m ) = − β m ( e m T z ( m ) ) v m + 1 , 即 r m r_m r m 与 v m + 1 v_{m + 1} v m + 1 平行.记
r m = τ m v m + 1 , m = 0 , 1 , 2 , … , (7.13) r _ {m} = \tau_ {m} v _ {m + 1}, \quad m = 0, 1, 2, \dots , \tag {7.13} r m = τ m v m + 1 , m = 0 , 1 , 2 , … , ( 7.13 ) 其中
τ 0 = β = ∥ r 0 ∥ 2 , τ m = − β m ( e m T z ( m ) ) , m = 1 , 2 , … . \tau_ {0} = \beta = \| r _ {0} \| _ {2}, \quad \tau_ {m} = - \beta_ {m} \left(e _ {m} ^ {\mathsf {T}} z ^ {(m)}\right), \quad m = 1, 2, \dots . τ 0 = β = ∥ r 0 ∥ 2 , τ m = − β m ( e m T z ( m ) ) , m = 1 , 2 , … . 引入变量
p m ≜ τ m − 1 p ~ m , m = 1 , 2 , … . p _ {m} \triangleq \tau_ {m - 1} \tilde {p} _ {m}, \quad m = 1, 2, \dots . p m ≜ τ m − 1 p ~ m , m = 1 , 2 , … . 由 ( \ref e q : 1 ) (\ref{eq:1}) ( \ref e q : 1 ) 和(7.12)可知, { p m } \{p_m\} { p m } 满足下面的递推公式:
p m + 1 = τ m p ~ m + 1 = τ m ( v m + 1 − l m p ~ m ) = r m + μ m p m , (7.14) p _ {m + 1} = \tau_ {m} \tilde {p} _ {m + 1} = \tau_ {m} \left(v _ {m + 1} - l _ {m} \tilde {p} _ {m}\right) = r _ {m} + \mu_ {m} p _ {m}, \tag {7.14} p m + 1 = τ m p ~ m + 1 = τ m ( v m + 1 − l m p ~ m ) = r m + μ m p m , ( 7.14 ) 其中 μ m = − l m τ m τ m − 1 , m = 1 , 2 , … \mu_{m} = -\frac{l_{m}\tau_{m}}{\tau_{m - 1}},m = 1,2,\dots μ m = − τ m − 1 l m τ m , m = 1 , 2 , …
于是我们就得到递推公式 ( m = 1 , 2 , … ) (m = 1,2,\ldots) ( m = 1 , 2 , … )
x ( m + 1 ) = x ( m ) + η m + 1 p ~ m + 1 = x ( m ) + ξ m + 1 p m + 1 , (7.15) x ^ {(m + 1)} = x ^ {(m)} + \eta_ {m + 1} \tilde {p} _ {m + 1} = x ^ {(m)} + \xi_ {m + 1} p _ {m + 1}, \tag {7.15} x ( m + 1 ) = x ( m ) + η m + 1 p ~ m + 1 = x ( m ) + ξ m + 1 p m + 1 , ( 7.15 ) r m + 1 = r m − η m + 1 A p ~ m + 1 = r m − ξ m + 1 A p m + 1 , (7.16) r _ {m + 1} = r _ {m} - \eta_ {m + 1} A \tilde {p} _ {m + 1} = r _ {m} - \xi_ {m + 1} A p _ {m + 1}, \tag {7.16} r m + 1 = r m − η m + 1 A p ~ m + 1 = r m − ξ m + 1 A p m + 1 , ( 7.16 ) 其中 ξ m + 1 = η m + 1 τ m , m = 1 , 2 , … . \xi_{m + 1} = \frac{\eta_{m + 1}}{\tau_m}, m = 1,2,\dots. ξ m + 1 = τ m η m + 1 , m = 1 , 2 , … .
下面需要考虑系数 ξ m + 1 \xi_{m + 1} ξ m + 1 和 μ m \mu_{m} μ m 的计算方法. 首先给出下面的性质
引理7.7 对于共轭梯度法,下面的结论成立: (1) r 1 , r 2 , … , r m r_1, r_2, \ldots, r_m r 1 , r 2 , … , r m 相互正交; (2) p 1 , p 2 , … , p m p_1, p_2, \ldots, p_m p 1 , p 2 , … , p m 相互 A A A -共轭 (或 A A A -正交), 即当 i ≠ j i \neq j i = j 时有 p i ⊤ A p j = 0 p_i^\top A p_j = 0 p i ⊤ A p j = 0 .
(板书)
证明. (1) 由于 r k r_k r k 与 v k + 1 v_{k+1} v k + 1 平行, 所以结论成立.
(2) 只需证明 P m T A P m P_{m}^{\mathsf{T}} A P_{m} P m T A P m 是对角矩阵即可, 即证 P ~ m T A P ~ m \tilde{P}_{m}^{\mathsf{T}} A \tilde{P}_{m} P ~ m T A P ~ m 是对角矩阵. 通过直接计算可得
P ~ m T A P ~ m = ( V m L m − T ) T A V m L m − T = L m − 1 V m T A V m L m − T = L m − 1 T m L m − T = L m − 1 ( L m D m L m ⊤ ) L m − T = D m . \begin{array}{l} \tilde {P} _ {m} ^ {\mathsf {T}} A \tilde {P} _ {m} = (V _ {m} L _ {m} ^ {- T}) ^ {\mathsf {T}} A V _ {m} L _ {m} ^ {- T} \\ = L _ {m} ^ {- 1} V _ {m} ^ {\mathsf {T}} A V _ {m} L _ {m} ^ {- T} \\ = L _ {m} ^ {- 1} T _ {m} L _ {m} ^ {- T} \\ = L _ {m} ^ {- 1} (L _ {m} D _ {m} L _ {m} ^ {\top}) L _ {m} ^ {- T} \\ = D _ {m}. \\ \end{array} P ~ m T A P ~ m = ( V m L m − T ) T A V m L m − T = L m − 1 V m T A V m L m − T = L m − 1 T m L m − T = L m − 1 ( L m D m L m ⊤ ) L m − T = D m . □
下面给出 ξ m + 1 \xi_{m + 1} ξ m + 1 和 μ m \mu_{m} μ m 的计算公式. 在等式 (7.13) 两边同时左乘 p m + 1 T A p_{m + 1}^{\mathsf{T}}A p m + 1 T A 可得
p m + 1 ⊤ A p m + 1 = p m + 1 ⊤ A r m + μ m p m + 1 ⊤ A p m = r m ⊤ A p m + 1 . p _ {m + 1} ^ {\top} A p _ {m + 1} = p _ {m + 1} ^ {\top} A r _ {m} + \mu_ {m} p _ {m + 1} ^ {\top} A p _ {m} = r _ {m} ^ {\top} A p _ {m + 1}. p m + 1 ⊤ A p m + 1 = p m + 1 ⊤ A r m + μ m p m + 1 ⊤ A p m = r m ⊤ A p m + 1 . 再用 r m T r_m^{\mathsf{T}} r m T 左乘方程(7.15)可得
0 = r m T r m + 1 = r m T r m − ξ m + 1 r m T A p m + 1 . 0 = r _ {m} ^ {\mathsf {T}} r _ {m + 1} = r _ {m} ^ {\mathsf {T}} r _ {m} - \xi_ {m + 1} r _ {m} ^ {\mathsf {T}} A p _ {m + 1}. 0 = r m T r m + 1 = r m T r m − ξ m + 1 r m T A p m + 1 . 于是
ξ m + 1 = r m ⊤ r m r m ⊤ A p m + 1 = r m ⊤ r m p m + 1 ⊤ A p m + 1 . (7.17) \xi_ {m + 1} = \frac {r _ {m} ^ {\top} r _ {m}}{r _ {m} ^ {\top} A p _ {m + 1}} = \frac {r _ {m} ^ {\top} r _ {m}}{p _ {m + 1} ^ {\top} A p _ {m + 1}}. \tag {7.17} ξ m + 1 = r m ⊤ A p m + 1 r m ⊤ r m = p m + 1 ⊤ A p m + 1 r m ⊤ r m . ( 7.17 ) 在等式 (7.13) 两边同时左乘 p m T A p_{m}^{\mathsf{T}} A p m T A 可得
0 = p m ⊤ A p m + 1 = p m ⊤ A r m + μ m p m ⊤ A p m . 0 = p _ {m} ^ {\top} A p _ {m + 1} = p _ {m} ^ {\top} A r _ {m} + \mu_ {m} p _ {m} ^ {\top} A p _ {m}. 0 = p m ⊤ A p m + 1 = p m ⊤ A r m + μ m p m ⊤ A p m . 于是
μ m = − p m ⊤ A r m p m ⊤ A p m = − r m ⊤ A p m p m ⊤ A p m . (7.18) \mu_ {m} = - \frac {p _ {m} ^ {\top} A r _ {m}}{p _ {m} ^ {\top} A p _ {m}} = - \frac {r _ {m} ^ {\top} A p _ {m}}{p _ {m} ^ {\top} A p _ {m}}. \tag {7.18} μ m = − p m ⊤ A p m p m ⊤ A r m = − p m ⊤ A p m r m ⊤ A p m . ( 7.18 ) 为了进一步减少运算量, 我们还可以将上式进行简化. 用 r m + 1 T r_{m+1}^{\mathsf{T}} r m + 1 T 左乘方程 (7.15) 可得
r m + 1 T r m + 1 = r m + 1 T r m − ξ m + 1 r m + 1 T A p m + 1 = − r m T r m p m + 1 T A p m + 1 ⋅ r m + 1 T A p m + 1 . r _ {m + 1} ^ {\mathsf {T}} r _ {m + 1} = r _ {m + 1} ^ {\mathsf {T}} r _ {m} - \xi_ {m + 1} r _ {m + 1} ^ {\mathsf {T}} A p _ {m + 1} = - \frac {r _ {m} ^ {\mathsf {T}} r _ {m}}{p _ {m + 1} ^ {\mathsf {T}} A p _ {m + 1}} \cdot r _ {m + 1} ^ {\mathsf {T}} A p _ {m + 1}. r m + 1 T r m + 1 = r m + 1 T r m − ξ m + 1 r m + 1 T A p m + 1 = − p m + 1 T A p m + 1 r m T r m ⋅ r m + 1 T A p m + 1 . 于是
r m + 1 T A p m + 1 = − r m + 1 T r m + 1 r m T r m ⋅ p m + 1 T A p m + 1 . r _ {m + 1} ^ {\mathsf {T}} A p _ {m + 1} = - \frac {r _ {m + 1} ^ {\mathsf {T}} r _ {m + 1}}{r _ {m} ^ {\mathsf {T}} r _ {m}} \cdot p _ {m + 1} ^ {\mathsf {T}} A p _ {m + 1}. r m + 1 T A p m + 1 = − r m T r m r m + 1 T r m + 1 ⋅ p m + 1 T A p m + 1 . 所以 r m T A p m = − r m T r m r m − 1 T r m − 1 ⋅ p m T A p m , r_m^{\mathsf{T}}Ap_m = -\frac{r_m^{\mathsf{T}}r_m}{r_{m - 1}^{\mathsf{T}}r_{m - 1}}\cdot p_m^{\mathsf{T}}Ap_m, r m T A p m = − r m − 1 T r m − 1 r m T r m ⋅ p m T A p m , 代入(7.17)可得
μ m = − r m ⊤ A p m p m ⊤ A p m = r m ⊤ r m r m − 1 ⊤ r m − 1 . (7.19) \mu_ {m} = - \frac {r _ {m} ^ {\top} A p _ {m}}{p _ {m} ^ {\top} A p _ {m}} = \frac {r _ {m} ^ {\top} r _ {m}}{r _ {m - 1} ^ {\top} r _ {m - 1}}. \tag {7.19} μ m = − p m ⊤ A p m r m ⊤ A p m = r m − 1 ⊤ r m − 1 r m ⊤ r m . ( 7.19 ) 综上, 由 (7.18), (7.13), (7.16) 和 (7.14), (7.15) 就构成了 CG 算法的迭代过程, 即
p m + 1 = r m + μ m p m , 其 中 μ m = r m ⊤ r m r m − 1 ⊤ r m − 1 , p _ {m + 1} = r _ {m} + \mu_ {m} p _ {m}, \quad \text {其 中} \quad \mu_ {m} = \frac {r _ {m} ^ {\top} r _ {m}}{r _ {m - 1} ^ {\top} r _ {m - 1}}, p m + 1 = r m + μ m p m , 其 中 μ m = r m − 1 ⊤ r m − 1 r m ⊤ r m , x ( m + 1 ) = x ( m ) + ξ m + 1 p m + 1 , x ^ {(m + 1)} = x ^ {(m)} + \xi_ {m + 1} p _ {m + 1}, x ( m + 1 ) = x ( m ) + ξ m + 1 p m + 1 , r m + 1 = r m − ξ m + 1 A p m + 1 , 其 中 ξ m + 1 = r m ⊤ r m p m + 1 ⊤ A p m + 1 . r _ {m + 1} = r _ {m} - \xi_ {m + 1} A p _ {m + 1}, \quad \text {其 中} \quad \xi_ {m + 1} = \frac {r _ {m} ^ {\top} r _ {m}}{p _ {m + 1} ^ {\top} A p _ {m + 1}}. r m + 1 = r m − ξ m + 1 A p m + 1 , 其 中 ξ m + 1 = p m + 1 ⊤ A p m + 1 r m ⊤ r m . 注意, 以上递推公式是从 m = 1 m = 1 m = 1 开始的. 因此 m = 0 m = 0 m = 0 时的计算公式需要另外推导
首先, 由 p ~ 1 \tilde{p}_1 p ~ 1 的定义可知
p ~ 1 = P ~ 1 = V 1 L 1 − T = v 1 . \tilde {p} _ {1} = \tilde {P} _ {1} = V _ {1} L _ {1} ^ {- T} = v _ {1}. p ~ 1 = P ~ 1 = V 1 L 1 − T = v 1 . 因此
p 1 = τ 0 p ~ 1 = β v 1 = r 0 . p _ {1} = \tau_ {0} \tilde {p} _ {1} = \beta v _ {1} = r _ {0}. p 1 = τ 0 p ~ 1 = β v 1 = r 0 . 其次, 由 Lanczos 过程可知 T 1 = α 1 = v 1 ⊤ A v 1 T_{1} = \alpha_{1} = v_{1}^{\top} A v_{1} T 1 = α 1 = v 1 ⊤ A v 1 . 注意到 β = r 0 ⊤ r 0 \beta = r_{0}^{\top} r_{0} β = r 0 ⊤ r 0 , 于是
x ( 1 ) = x ( 0 ) + V 1 T 1 − 1 ( β e 1 ) = x ( 0 ) + β v 1 ⊤ A v 1 v 1 = x ( 0 ) + r 0 ⊤ r 0 p 1 ⊤ A p 1 p 1 . x ^ {(1)} = x ^ {(0)} + V _ {1} T _ {1} ^ {- 1} (\beta e _ {1}) = x ^ {(0)} + \frac {\beta}{v _ {1} ^ {\top} A v _ {1}} v _ {1} = x ^ {(0)} + \frac {r _ {0} ^ {\top} r _ {0}}{p _ {1} ^ {\top} A p _ {1}} p _ {1}. x ( 1 ) = x ( 0 ) + V 1 T 1 − 1 ( β e 1 ) = x ( 0 ) + v 1 ⊤ A v 1 β v 1 = x ( 0 ) + p 1 ⊤ A p 1 r 0 ⊤ r 0 p 1 . 令 ξ 1 = r 0 ⊤ r 0 p 1 ⊤ A p 1 \xi_{1} = \frac{r_{0}^{\top}r_{0}}{p_{1}^{\top}Ap_{1}} ξ 1 = p 1 ⊤ A p 1 r 0 ⊤ r 0 (注: 之前的 ξ m + 1 \xi_{m+1} ξ m + 1 计算公式 (7.16) 只对 m ≥ 1 m \geq 1 m ≥ 1 有定义), 则当 m = 0 m = 0 m = 0 时关于 x ( m + 1 ) x^{(m+1)} x ( m + 1 ) 的递推公式仍然成立.
最后考虑残量. 易知
r 1 = b − A x ( 1 ) = b − A x ( 0 ) − r 0 ⊤ r 0 p 1 ⊤ A p 1 A p 1 = r 0 − ξ 1 A p 1 , r _ {1} = b - A x ^ {(1)} = b - A x ^ {(0)} - \frac {r _ {0} ^ {\top} r _ {0}}{p _ {1} ^ {\top} A p _ {1}} A p _ {1} = r _ {0} - \xi_ {1} A p _ {1}, r 1 = b − A x ( 1 ) = b − A x ( 0 ) − p 1 ⊤ A p 1 r 0 ⊤ r 0 A p 1 = r 0 − ξ 1 A p 1 , 即当 m = 0 m = 0 m = 0 时关于 r m + 1 r_{m + 1} r m + 1 的递推公式也成立
于是,我们就可以给出下面的CG算法
算法7.7. 共轭梯度法(CG) 1: 选取初值 x ( 0 ) x^{(0)} x ( 0 ) , 停机准则 ε > 0 \varepsilon > 0 ε > 0 , 最大迭代步数 IterMax 2: r 0 = b − A x ( 0 ) r_0 = b - Ax^{(0)} r 0 = b − A x ( 0 ) 3: β = ∥ r 0 ∥ 2 \beta = \| r_0\| _2 β = ∥ r 0 ∥ 2 4: if β < ε \beta < \varepsilon β < ε then 5: 停止迭代, 输出近似解 x ( 0 ) x^{(0)} x ( 0 ) 6: end if 7: for m = 1 m = 1 m = 1 to IterMax do 8: ρ = r m − 1 T r m − 1 \rho = r_{m - 1}^{\mathsf{T}}r_{m - 1} ρ = r m − 1 T r m − 1 9: if m > 1 m > 1 m > 1 then 10: μ m − 1 = ρ / ρ 0 \mu_{m - 1} = \rho /\rho_0 μ m − 1 = ρ / ρ 0 11: p m = r m − 1 + μ m − 1 p m − 1 p_{m} = r_{m - 1} + \mu_{m - 1}p_{m - 1} p m = r m − 1 + μ m − 1 p m − 1 12: else 13: p m = r 0 p_{m} = r_{0} p m = r 0 14: end if 15: q m = A p m q_{m} = Ap_{m} q m = A p m 16: ξ m = ρ / ( p m T q m ) \xi_{m} = \rho /(p_{m}^{\mathsf{T}}q_{m}) ξ m = ρ / ( p m T q m ) 17: x ( m ) = x ( m − 1 ) + ξ m p m x^{(m)} = x^{(m - 1)} + \xi_{m}p_{m} x ( m ) = x ( m − 1 ) + ξ m p m 18: r m = r m − 1 − ξ m q m r_m = r_{m - 1} - \xi_mq_m r m = r m − 1 − ξ m q m 19: relres = ||rm||2/β 20: if relres < ε < \varepsilon < ε then 21: 停止迭代, 输出近似解 x ( m ) x^{(m)} x ( m ) 22: end if 23: ρ 0 = ρ \rho_0 = \rho ρ 0 = ρ
24: end for 25: if relres < ε < \varepsilon < ε then 26: 输出近似解 x ( m ) x^{(m)} x ( m ) 及相关信息 27: else 28: 输出算法失败信息 29: end if
CG算法的每个迭代步的主要运算为一个矩阵向量乘积和两个向量内积;