3.2_Optimal_state_values_and_optimal_policies
3.2 Optimal state values and optimal policies
While the ultimate goal of reinforcement learning is to obtain optimal policies, it is necessary to first define what an optimal policy is. The definition is based on state values. In particular, consider two given policies and . If the state value of is greater than or equal to that of for any state:
then is said to be better than . Furthermore, if a policy is better than all the other possible policies, then this policy is optimal. This is formally stated below.
Definition 3.1 (Optimal policy and optimal state value). A policy is optimal if for all and for any other policy . The state values of are the optimal state values.
The above definition indicates that an optimal policy has the greatest state value for every state compared to all the other policies. This definition also leads to many questions:
Existence: Does the optimal policy exist?
Uniqueness: Is the optimal policy unique?
Stochasticity: Is the optimal policy stochastic or deterministic?
Algorithm: How to obtain the optimal policy and the optimal state values?
These fundamental questions must be clearly answered to thoroughly understand optimal policies. For example, regarding the existence of optimal policies, if optimal policies do not exist, then we do not need to bother to design algorithms to find them.
We will answer all these questions in the remainder of this chapter.