2.6_Matrix-vector_form_of_the_Bellman_equation

2.6 Matrix-vector form of the Bellman equation

The Bellman equation in (2.7) is in an elementwise form. Since it is valid for every state, we can combine all these equations and write them concisely in a matrix-vector form, which will be frequently used to analyze the Bellman equation.

To derive the matrix-vector form, we first rewrite the Bellman equation in (2.7) as

vπ(s)=rπ(s)+γsSpπ(ss)vπ(s),(2.8)v _ {\pi} (s) = r _ {\pi} (s) + \gamma \sum_ {s ^ {\prime} \in \mathcal {S}} p _ {\pi} (s ^ {\prime} | s) v _ {\pi} (s ^ {\prime}), \tag {2.8}

where

rπ(s)aAπ(as)rRp(rs,a)r,r _ {\pi} (s) \doteq \sum_ {a \in \mathcal {A}} \pi (a | s) \sum_ {r \in \mathcal {R}} p (r | s, a) r,
pπ(ss)aAπ(as)p(ss,a).p _ {\pi} (s ^ {\prime} | s) \doteq \sum_ {a \in \mathcal {A}} \pi (a | s) p (s ^ {\prime} | s, a).

Here, rπ(s)r_{\pi}(s) denotes the mean of the immediate rewards, and pπ(ss)p_{\pi}(s'|s) is the probability of transitioning from ss to ss' under policy π\pi .