6.1 Motivating example: Mean estimation

We next demonstrate how to convert a non-incremental algorithm to an incremental one by examining the mean estimation problem.

Consider a random variable $X$ that takes values from a finite set $\mathcal{X}$ . Our goal is to estimate $\mathbb{E}[X]$ . Suppose that we have a sequence of i.i.d. samples $\{x_{i}\}_{i = 1}^{n}$ . The expected value of $X$ can be approximated by

\mathbb {E} [ X ] \approx \bar {x} \doteq \frac {1}{n} \sum_ {i = 1} ^ {n} x _ {i}. \tag {6.1}

The approximation in (6.1) is the basic idea of Monte Carlo estimation, as introduced in Chapter 5. We know that $\bar{x} \to \mathbb{E}[X]$ as $n \to \infty$ according to the law of large numbers.

We next show that two methods can be used to calculate $\bar{x}$ in (6.1). The first non-incremental method collects all the samples first and then calculates the average. The drawback of such a method is that, if the number of samples is large, we may have to wait for a long time until all of the samples are collected. The second method can avoid this drawback because it calculates the average in an incremental manner. Specifically, suppose that

w _ {k + 1} \doteq \frac {1}{k} \sum_ {i = 1} ^ {k} x _ {i}, \quad k = 1, 2, \ldots

and hence

w _ {k} = \frac {1}{k - 1} \sum_ {i = 1} ^ {k - 1} x _ {i}, \quad k = 2, 3, \ldots .

Then, $w_{k + 1}$ can be expressed in terms of $w_{k}$ as

w _ {k + 1} = \frac {1}{k} \sum_ {i = 1} ^ {k} x _ {i} = \frac {1}{k} \left(\sum_ {i = 1} ^ {k - 1} x _ {i} + x _ {k}\right) = \frac {1}{k} ((k - 1) w _ {k} + x _ {k}) = w _ {k} - \frac {1}{k} (w _ {k} - x _ {k}).

Therefore, we obtain the following incremental algorithm:

w _ {k + 1} = w _ {k} - \frac {1}{k} (w _ {k} - x _ {k}). \tag {6.2}

This algorithm can be used to calculate the mean $\bar{x}$ in an incremental manner. It can be verified that

w _ {1} = x _ {1},

w _ {2} = w _ {1} - \frac {1}{1} (w _ {1} - x _ {1}) = x _ {1},

w _ {3} = w _ {2} - \frac {1}{2} (w _ {2} - x _ {2}) = x _ {1} - \frac {1}{2} (x _ {1} - x _ {2}) = \frac {1}{2} (x _ {1} + x _ {2}),

w _ {4} = w _ {3} - \frac {1}{3} (w _ {3} - x _ {3}) = \frac {1}{3} (x _ {1} + x _ {2} + x _ {3}),

：

w _ {k + 1} = \frac {1}{k} \sum_ {i = 1} ^ {k} x _ {i}. \tag {6.3}

The advantage of (6.2) is that the average can be immediately calculated every time we receive a sample. This average can be used to approximate $\bar{x}$ and hence $\mathbb{E}[X]$ . Notably, the approximation may not be accurate at the beginning due to insufficient samples. However, it is better than nothing. As more samples are obtained, the estimation accuracy can be gradually improved according to the law of large numbers. In addition, one can also define $w_{k + 1} = \frac{1}{1 + k}\sum_{i = 1}^{k + 1}x_i$ and $w_{k} = \frac{1}{k}\sum_{i = 1}^{k}x_{i}$ . Doing so would not make any significant difference. In this case, the corresponding iterative algorithm is $w_{k + 1} = w_{k} - \frac{1}{1 + k} (w_{k} - x_{k + 1})$ .

Furthermore, consider an algorithm with a more general expression:

w _ {k + 1} = w _ {k} - \alpha_ {k} (w _ {k} - x _ {k}). \tag {6.4}

This algorithm is important and frequently used in this chapter. It is the same as (6.2) except that the coefficient $1 / k$ is replaced by $\alpha_{k} > 0$ . Since the expression of $\alpha_{k}$ is not given, we are not able to obtain the explicit expression of $w_{k}$ as in (6.3). However, we will show in the next section that, if $\{\alpha_k\}$ satisfies some mild conditions, $w_{k}\to \mathbb{E}[X]$ as $k\to \infty$ . In Chapter 7, we will see that temporal-difference algorithms have similar (but more complex) expressions.

6.1_Motivating_example_Mean_estimation

6.1 Motivating example: Mean estimation