5.3_MC_Exploring_Starts

5.3 MC Exploring Starts

We next extend the MC Basic algorithm to obtain another MC-based reinforcement learning algorithm that is slightly more complicated but more sample-efficient.

5.3.1 Utilizing samples more efficiently

An important aspect of MC-based reinforcement learning is how to use samples more efficiently. Specifically, suppose that we have an episode of samples obtained by following a policy π\pi :

s1a2s2a4s1a2s2a3s5a1(5.3)s _ {1} \xrightarrow {a _ {2}} s _ {2} \xrightarrow {a _ {4}} s _ {1} \xrightarrow {a _ {2}} s _ {2} \xrightarrow {a _ {3}} s _ {5} \xrightarrow {a _ {1}} \dots \tag {5.3}
5.3_MC_Exploring_Starts - 强化学习的数学基础 | OpenTech