6.1_Motivating_example_Mean_estimation

6.1 Motivating example: Mean estimation

We next demonstrate how to convert a non-incremental algorithm to an incremental one by examining the mean estimation problem.

Consider a random variable XX that takes values from a finite set X\mathcal{X} . Our goal is to estimate E[X]\mathbb{E}[X] . Suppose that we have a sequence of i.i.d. samples {xi}i=1n\{x_{i}\}_{i = 1}^{n} . The expected value of XX can be approximated by

E[X]xˉ1ni=1nxi.(6.1)\mathbb {E} [ X ] \approx \bar {x} \doteq \frac {1}{n} \sum_ {i = 1} ^ {n} x _ {i}. \tag {6.1}

The approximation in (6.1) is the basic idea of Monte Carlo estimation, as introduced in Chapter 5. We know that xˉE[X]\bar{x} \to \mathbb{E}[X] as nn \to \infty according to the law of large numbers.

We next show that two methods can be used to calculate xˉ\bar{x} in (6.1). The first non-incremental method collects all the samples first and then calculates the average. The drawback of such a method is that, if the number of samples is large, we may have to wait for a long time until all of the samples are collected. The second method can avoid this drawback because it calculates the average in an incremental manner. Specifically, suppose that

wk+11ki=1kxi,k=1,2,w _ {k + 1} \doteq \frac {1}{k} \sum_ {i = 1} ^ {k} x _ {i}, \quad k = 1, 2, \ldots

and hence

wk=1k1i=1k1xi,k=2,3,.w _ {k} = \frac {1}{k - 1} \sum_ {i = 1} ^ {k - 1} x _ {i}, \quad k = 2, 3, \ldots .

Then, wk+1w_{k + 1} can be expressed in terms of wkw_{k} as

wk+1=1ki=1kxi=1k(i=1k1xi+xk)=1k((k1)wk+xk)=wk1k(wkxk).w _ {k + 1} = \frac {1}{k} \sum_ {i = 1} ^ {k} x _ {i} = \frac {1}{k} \left(\sum_ {i = 1} ^ {k - 1} x _ {i} + x _ {k}\right) = \frac {1}{k} ((k - 1) w _ {k} + x _ {k}) = w _ {k} - \frac {1}{k} (w _ {k} - x _ {k}).

Therefore, we obtain the following incremental algorithm:

wk+1=wk1k(wkxk).(6.2)w _ {k + 1} = w _ {k} - \frac {1}{k} (w _ {k} - x _ {k}). \tag {6.2}

This algorithm can be used to calculate the mean xˉ\bar{x} in an incremental manner. It can be verified that

w1=x1,w _ {1} = x _ {1},
w2=w111(w1x1)=x1,w _ {2} = w _ {1} - \frac {1}{1} (w _ {1} - x _ {1}) = x _ {1},
w3=w212(w2x2)=x112(x1x2)=12(x1+x2),w _ {3} = w _ {2} - \frac {1}{2} (w _ {2} - x _ {2}) = x _ {1} - \frac {1}{2} (x _ {1} - x _ {2}) = \frac {1}{2} (x _ {1} + x _ {2}),
w4=w313(w3x3)=13(x1+x2+x3),w _ {4} = w _ {3} - \frac {1}{3} (w _ {3} - x _ {3}) = \frac {1}{3} (x _ {1} + x _ {2} + x _ {3}),

wk+1=1ki=1kxi.(6.3)w _ {k + 1} = \frac {1}{k} \sum_ {i = 1} ^ {k} x _ {i}. \tag {6.3}

The advantage of (6.2) is that the average can be immediately calculated every time we receive a sample. This average can be used to approximate xˉ\bar{x} and hence E[X]\mathbb{E}[X] . Notably, the approximation may not be accurate at the beginning due to insufficient samples. However, it is better than nothing. As more samples are obtained, the estimation accuracy can be gradually improved according to the law of large numbers. In addition, one can also define wk+1=11+ki=1k+1xiw_{k + 1} = \frac{1}{1 + k}\sum_{i = 1}^{k + 1}x_i and wk=1ki=1kxiw_{k} = \frac{1}{k}\sum_{i = 1}^{k}x_{i} . Doing so would not make any significant difference. In this case, the corresponding iterative algorithm is wk+1=wk11+k(wkxk+1)w_{k + 1} = w_{k} - \frac{1}{1 + k} (w_{k} - x_{k + 1}) .

Furthermore, consider an algorithm with a more general expression:

wk+1=wkαk(wkxk).(6.4)w _ {k + 1} = w _ {k} - \alpha_ {k} (w _ {k} - x _ {k}). \tag {6.4}

This algorithm is important and frequently used in this chapter. It is the same as (6.2) except that the coefficient 1/k1 / k is replaced by αk>0\alpha_{k} > 0 . Since the expression of αk\alpha_{k} is not given, we are not able to obtain the explicit expression of wkw_{k} as in (6.3). However, we will show in the next section that, if {αk}\{\alpha_k\} satisfies some mild conditions, wkE[X]w_{k}\to \mathbb{E}[X] as kk\to \infty . In Chapter 7, we will see that temporal-difference algorithms have similar (but more complex) expressions.