5.3_MC_Exploring_Starts

5.3 MC Exploring Starts

We next extend the MC Basic algorithm to obtain another MC-based reinforcement learning algorithm that is slightly more complicated but more sample-efficient.

5.3.1 Utilizing samples more efficiently

An important aspect of MC-based reinforcement learning is how to use samples more efficiently. Specifically, suppose that we have an episode of samples obtained by following a policy $\pi$ :

s _ {1} \xrightarrow {a _ {2}} s _ {2} \xrightarrow {a _ {4}} s _ {1} \xrightarrow {a _ {2}} s _ {2} \xrightarrow {a _ {3}} s _ {5} \xrightarrow {a _ {1}} \dots \tag {5.3}