1.3 State transition

When taking an action, the agent may move from one state to another. Such a process is called state transition. For example, if the agent is in state $s_1$ and selects action $a_2$ (that is, moving rightward), then the agent moves to state $s_2$ . Such a process can be expressed as

s _ {1} \xrightarrow {a _ {2}} s _ {2}.

We next examine two important examples.

$\diamond$ What is the next state when the agent attempts to go beyond the boundary, for example, taking action $a_1$ in state $s_1$ ? The answer is that the agent will be bounced back because it is impossible for the agent to exit the state space. Hence, we have $s_1 \xrightarrow{a_1} s_1$ .
$\diamond$ What is the next state when the agent attempts to enter a forbidden cell, for example, taking action $a_2$ in state $s_5$ ? Two different scenarios may be encountered. In the first scenario, although $s_6$ is forbidden, it is still accessible. In this case, the next state is $s_6$ ; hence, the state transition process is $s_5 \xrightarrow{a_2} s_6$ . In the second scenario, $s_6$ is not accessible because, for example, it is surrounded by walls. In this case, the agent is bounced back to $s_5$ if it attempts to move rightward; hence, the state transition process is $s_5 \xrightarrow{a_2} s_5$ .

Which scenario should we consider? The answer depends on the physical environment. In this book, we consider the first scenario where the forbidden cells are accessible, although stepping into them may get punished. This scenario is more general and interesting. Moreover, since we are considering a simulation task, we can define the state transition process however we prefer. In real-world applications, the state transition process is determined by real-world dynamics.

The state transition process is defined for each state and its associated actions. This process can be described by a table as shown in Table 1.1. In this table, each row

corresponds to a state, and each column corresponds to an action. Each cell indicates the next state to transition to after the agent takes an action at the corresponding state.

Table 1.1: A tabular representation of the state transition process. Each cell indicates the next state to transition to after the agent takes an action at a state.

Mathematically, the state transition process can be described by conditional probabilities. For example, for $s_1$ and $a_2$ , the conditional probability distribution is

p (s _ {1} | s _ {1}, a _ {2}) = 0,

p (s _ {2} | s _ {1}, a _ {2}) = 1,

p (s _ {3} | s _ {1}, a _ {2}) = 0,

p \left(s _ {4} \mid s _ {1}, a _ {2}\right) = 0,

p (s _ {5} | s _ {1}, a _ {2}) = 0,

which indicates that, when taking $a_2$ at $s_1$ , the probability of the agent moving to $s_2$ is one, and the probabilities of the agent moving to other states are zero. As a result, taking action $a_2$ at $s_1$ will certainly cause the agent to transition to $s_2$ . The preliminaries of conditional probability are given in Appendix A. Readers are strongly advised to be familiar with probability theory since it is necessary for studying reinforcement learning.

Although it is intuitive, the tabular representation is only able to describe deterministic state transitions. In general, state transitions can be stochastic and must be described by conditional probability distributions. For instance, when random wind gusts are applied across the grid, if taking action $a_2$ at $s_1$ , the agent may be blown to $s_5$ instead of $s_2$ . We have $p(s_5 | s_1, a_2) > 0$ in this case. Nevertheless, we merely consider deterministic state transitions in the grid world examples for simplicity in this book.

1.3_State_transition

1.3 State transition