8.1_Value_representation_From_table_to_function

8.1 Value representation: From table to function

We next use an example to demonstrate the difference between the tabular and function approximation methods.

Suppose that there are nn states {si}i=1n\{s_i\}_{i=1}^n , whose state values are {vπ(si)}i=1n\{v_{\pi}(s_i)\}_{i=1}^n . Here, π\pi is a given policy. Let {v^(si)}i=1n\{\hat{v}(s_i)\}_{i=1}^n denote the estimates of the true state values. If we use the tabular method, the estimated values can be maintained in the following table. This table can be stored in memory as an array or a vector. To retrieve or update any value, we can directly read or rewrite the corresponding entry in the table.

We next show that the values in the above table can be approximated by a function. In particular, {(si,v^(si))}i=1n\{(s_i,\hat{v} (s_i))\}_{i = 1}^n are shown as nn points in Figure 8.2. These points can be fitted or approximated by a curve. The simplest curve is a straight line, which can be described as

v^(s,w)=as+b=[s,1]ϕT(s)[ab]w=ϕT(s)w.(8.1)\hat {v} (s, w) = a s + b = \underbrace {[ s , 1 ]} _ {\phi^ {T} (s)} \underbrace {\left[ \begin{array}{l} a \\ b \end{array} \right]} _ {w} = \phi^ {T} (s) w. \tag {8.1}

Here, v^(s,w)\hat{v}(s,w) is a function for approximating vπ(s)v_{\pi}(s) . It is determined jointly by the state ss and the parameter vector wR2w \in \mathbb{R}^2 . v^(s,w)\hat{v}(s,w) is sometimes written as v^w(s)\hat{v}_w(s) . Here, ϕ(s)R2\phi(s) \in \mathbb{R}^2 is called the feature vector of ss .

The first notable difference between the tabular and function approximation methods concerns how they retrieve and update a value.

\diamond How to retrieve a value: When the values are represented by a table, if we want to retrieve a value, we can directly read the corresponding entry in the table. However,