2.6_Matrix-vector_form_of_the_Bellman_equation

2.6 Matrix-vector form of the Bellman equation

The Bellman equation in (2.7) is in an elementwise form. Since it is valid for every state, we can combine all these equations and write them concisely in a matrix-vector form, which will be frequently used to analyze the Bellman equation.

To derive the matrix-vector form, we first rewrite the Bellman equation in (2.7) as

v _ {\pi} (s) = r _ {\pi} (s) + \gamma \sum_ {s ^ {\prime} \in \mathcal {S}} p _ {\pi} (s ^ {\prime} | s) v _ {\pi} (s ^ {\prime}), \tag {2.8}

where

r _ {\pi} (s) \doteq \sum_ {a \in \mathcal {A}} \pi (a | s) \sum_ {r \in \mathcal {R}} p (r | s, a) r,

p _ {\pi} (s ^ {\prime} | s) \doteq \sum_ {a \in \mathcal {A}} \pi (a | s) p (s ^ {\prime} | s, a).

Here, $r_{\pi}(s)$ denotes the mean of the immediate rewards, and $p_{\pi}(s'|s)$ is the probability of transitioning from $s$ to $s'$ under policy $\pi$ .