2.6_Matrix-vector_form_of_the_Bellman_equation
2.6 Matrix-vector form of the Bellman equation
The Bellman equation in (2.7) is in an elementwise form. Since it is valid for every state, we can combine all these equations and write them concisely in a matrix-vector form, which will be frequently used to analyze the Bellman equation.
To derive the matrix-vector form, we first rewrite the Bellman equation in (2.7) as
where
Here, denotes the mean of the immediate rewards, and is the probability of transitioning from to under policy .