8.1 Value representation: From table to function

We next use an example to demonstrate the difference between the tabular and function approximation methods.

Suppose that there are $n$ states $\{s_i\}_{i=1}^n$ , whose state values are $\{v_{\pi}(s_i)\}_{i=1}^n$ . Here, $\pi$ is a given policy. Let $\{\hat{v}(s_i)\}_{i=1}^n$ denote the estimates of the true state values. If we use the tabular method, the estimated values can be maintained in the following table. This table can be stored in memory as an array or a vector. To retrieve or update any value, we can directly read or rewrite the corresponding entry in the table.

We next show that the values in the above table can be approximated by a function. In particular, $\{(s_i,\hat{v} (s_i))\}_{i = 1}^n$ are shown as $n$ points in Figure 8.2. These points can be fitted or approximated by a curve. The simplest curve is a straight line, which can be described as

\hat {v} (s, w) = a s + b = \underbrace {[ s , 1 ]} _ {\phi^ {T} (s)} \underbrace {\left[ \begin{array}{l} a \\ b \end{array} \right]} _ {w} = \phi^ {T} (s) w. \tag {8.1}

Here, $\hat{v}(s,w)$ is a function for approximating $v_{\pi}(s)$ . It is determined jointly by the state $s$ and the parameter vector $w \in \mathbb{R}^2$ . $\hat{v}(s,w)$ is sometimes written as $\hat{v}_w(s)$ . Here, $\phi(s) \in \mathbb{R}^2$ is called the feature vector of $s$ .

The first notable difference between the tabular and function approximation methods concerns how they retrieve and update a value.

$\diamond$ How to retrieve a value: When the values are represented by a table, if we want to retrieve a value, we can directly read the corresponding entry in the table. However,

8.1_Value_representation_From_table_to_function

8.1 Value representation: From table to function