One main reason why we care about measure-theoretic probability theory is that it can rigorously describe the convergence properties of stochastic sequences.

Consider the stochastic sequence $\{X_k\} \doteq \{X_1, X_2, \ldots, X_k, \ldots\}$ . Each element in this sequence is a random variable defined on a triple $(\Omega, \mathcal{F}, \mathbb{P})$ . When we say $\{X_k\}$ converges to a random variable $X$ , we should be careful since there are different types of convergence as shown below.

$\diamond$ Sure convergence:

Definition: $\{X_k\}$ converges surely (or everywhere or pointwise) to $X$ if

\lim _ {k \to \infty} X _ {k} (\omega) = X (\omega), \quad \text {f o r a l l} \omega \in \Omega .

It means that $\lim_{k\to \infty}X_k(\omega) = X(\omega)$ is valid for all points in $\Omega$ . This definition can be equivalently stated as

A = \Omega \quad \text {w h e r e} \quad A = \left\{\omega \in \Omega : \lim _ {k \to \infty} X _ {k} (\omega) = X (\omega) \right\}.

$\diamond$ Almost sure convergence:

Definition: $\{X_k\}$ converges almost surely (or almost everywhere or with probability 1 or w.p.1) to $X$ if

\mathbb {P} (A) = 1 \quad \text {w h e r e} \quad A = \left\{\omega \in \Omega : \lim _ {k \rightarrow \infty} X _ {k} (\omega) = X (\omega) \right\}. \tag {B.3}

It means that $\lim_{k\to \infty}X_k(\omega) = X(\omega)$ is valid for almost all points in $\Omega$ . The points, for which this limit is invalid, form a set of zero measure. For the sake of simplicity, (B.3) is often written as

\mathbb {P} \left(\lim _ {k \rightarrow \infty} X _ {k} = X\right) = 1.

Almost sure convergence can be denoted as $X_{k}\xrightarrow{a.s.}X$

Convergence in probability:

Definition: $\{X_k\}$ converges in probability to $X$ if for any $\epsilon > 0$ ,

\lim _ {k \rightarrow \infty} \mathbb {P} (A _ {k}) = 0 \quad \text {w h e r e} \quad A _ {k} = \left\{\omega \in \Omega : | X _ {k} (\omega) - X (\omega) | > \epsilon \right\}. \tag {B.4}

For simplicity, (B.4) can be written as

\lim _ {k \to \infty} \mathbb {P} (| X _ {k} - X | > \epsilon) = 0.

The difference between convergence in probability and (almost) sure convergence is as follows. Both sure convergence and almost sure convergence first evaluate the convergence of every point in $\Omega$ and then check the measure of these points that converge. By contrast, convergence in probability first checks the points that satisfy $|X_{k} - X| > \epsilon$ and then evaluates if the measure will converge to zero as $k\to \infty$ .

Convergence in mean:

Definition: $\{X_k\}$ converges in the $r$ -th mean (or in the $L^r$ norm) to $X$ if

\lim _ {k \to \infty} \mathbb {E} [ | X _ {k} - X | ^ {r} ] = 0.

The most frequently used cases are $r = 1$ and $r = 2$ . It is worth mentioning that convergence in mean is not equivalent to $\lim_{k\to \infty}\mathbb{E}[X_k - X] = 0$ or $\lim_{k\to \infty}\mathbb{E}[X_k] = \mathbb{E}[X]$ , which indicates that $\mathbb{E}[X_k]$ converges but the variance may not.

Convergence in distribution:

Definition: The cumulative distribution function of $X_{k}$ is defined as $\mathbb{P}(X_k \leq a)$ where $a \in \mathbb{R}$ . Then, $\{X_{k}\}$ converges to $X$ in distribution if the cumulative distribution function converges:

\lim _ {k \to \infty} \mathbb {P} (X _ {k} \leq a) = \mathbb {P} (X \leq a), \quad \mathrm {f o r a l l} a \in \mathbb {R}.

A compact expression is

\lim _ {k \to \infty} \mathbb {P} (A _ {k}) = \mathbb {P} (A),

where

A _ {k} \doteq \left\{\omega \in \Omega : X _ {k} (\omega) \leq a \right\}, A \doteq \left\{\omega \in \Omega : X (\omega) \leq a \right\}.

The relationships between the above types of convergence are given below:

almost sure convergence $\Rightarrow$ convergence in probability $\Rightarrow$ convergence in distribution convergence in mean $\Rightarrow$ convergence in probability $\Rightarrow$ convergence in distribution

Almost sure convergence and convergence in mean do not imply each other. More information can be found in [102].

README

Appendix C