1.3_State_transition

1.3 State transition

When taking an action, the agent may move from one state to another. Such a process is called state transition. For example, if the agent is in state s1s_1 and selects action a2a_2 (that is, moving rightward), then the agent moves to state s2s_2 . Such a process can be expressed as

s1a2s2.s _ {1} \xrightarrow {a _ {2}} s _ {2}.

We next examine two important examples.

\diamond What is the next state when the agent attempts to go beyond the boundary, for example, taking action a1a_1 in state s1s_1 ? The answer is that the agent will be bounced back because it is impossible for the agent to exit the state space. Hence, we have s1a1s1s_1 \xrightarrow{a_1} s_1 .
\diamond What is the next state when the agent attempts to enter a forbidden cell, for example, taking action a2a_2 in state s5s_5 ? Two different scenarios may be encountered. In the first scenario, although s6s_6 is forbidden, it is still accessible. In this case, the next state is s6s_6 ; hence, the state transition process is s5a2s6s_5 \xrightarrow{a_2} s_6 . In the second scenario, s6s_6 is not accessible because, for example, it is surrounded by walls. In this case, the agent is bounced back to s5s_5 if it attempts to move rightward; hence, the state transition process is s5a2s5s_5 \xrightarrow{a_2} s_5 .

Which scenario should we consider? The answer depends on the physical environment. In this book, we consider the first scenario where the forbidden cells are accessible, although stepping into them may get punished. This scenario is more general and interesting. Moreover, since we are considering a simulation task, we can define the state transition process however we prefer. In real-world applications, the state transition process is determined by real-world dynamics.

The state transition process is defined for each state and its associated actions. This process can be described by a table as shown in Table 1.1. In this table, each row

corresponds to a state, and each column corresponds to an action. Each cell indicates the next state to transition to after the agent takes an action at the corresponding state.

Table 1.1: A tabular representation of the state transition process. Each cell indicates the next state to transition to after the agent takes an action at a state.

Mathematically, the state transition process can be described by conditional probabilities. For example, for s1s_1 and a2a_2 , the conditional probability distribution is

p(s1s1,a2)=0,p (s _ {1} | s _ {1}, a _ {2}) = 0,
p(s2s1,a2)=1,p (s _ {2} | s _ {1}, a _ {2}) = 1,
p(s3s1,a2)=0,p (s _ {3} | s _ {1}, a _ {2}) = 0,
p(s4s1,a2)=0,p \left(s _ {4} \mid s _ {1}, a _ {2}\right) = 0,
p(s5s1,a2)=0,p (s _ {5} | s _ {1}, a _ {2}) = 0,

which indicates that, when taking a2a_2 at s1s_1 , the probability of the agent moving to s2s_2 is one, and the probabilities of the agent moving to other states are zero. As a result, taking action a2a_2 at s1s_1 will certainly cause the agent to transition to s2s_2 . The preliminaries of conditional probability are given in Appendix A. Readers are strongly advised to be familiar with probability theory since it is necessary for studying reinforcement learning.

Although it is intuitive, the tabular representation is only able to describe deterministic state transitions. In general, state transitions can be stochastic and must be described by conditional probability distributions. For instance, when random wind gusts are applied across the grid, if taking action a2a_2 at s1s_1 , the agent may be blown to s5s_5 instead of s2s_2 . We have p(s5s1,a2)>0p(s_5 | s_1, a_2) > 0 in this case. Nevertheless, we merely consider deterministic state transitions in the grid world examples for simplicity in this book.