One main reason why we care about measure-theoretic probability theory is that it can rigorously describe the convergence properties of stochastic sequences.
Consider the stochastic sequence {Xk}≐{X1,X2,…,Xk,…} . Each element in this sequence is a random variable defined on a triple (Ω,F,P) . When we say {Xk} converges to a random variable X , we should be careful since there are different types of convergence as shown below.
⋄ Sure convergence:
Definition: {Xk} converges surely (or everywhere or pointwise) to X if
k→∞limXk(ω)=X(ω),f o r a l lω∈Ω.
It means that limk→∞Xk(ω)=X(ω) is valid for all points in Ω . This definition can be equivalently stated as
A=Ωw h e r eA={ω∈Ω:k→∞limXk(ω)=X(ω)}.
⋄ Almost sure convergence:
Definition: {Xk} converges almost surely (or almost everywhere or with probability 1 or w.p.1) to X if
P(A)=1w h e r eA={ω∈Ω:k→∞limXk(ω)=X(ω)}.(B.3)
It means that limk→∞Xk(ω)=X(ω) is valid for almost all points in Ω . The points, for which this limit is invalid, form a set of zero measure. For the sake of simplicity, (B.3) is often written as
P(k→∞limXk=X)=1.
Almost sure convergence can be denoted as Xka.s.X
Convergence in probability:
Definition: {Xk} converges in probability to X if for any ϵ>0 ,
k→∞limP(Ak)=0w h e r eAk={ω∈Ω:∣Xk(ω)−X(ω)∣>ϵ}.(B.4)
For simplicity, (B.4) can be written as
k→∞limP(∣Xk−X∣>ϵ)=0.
The difference between convergence in probability and (almost) sure convergence is as follows. Both sure convergence and almost sure convergence first evaluate the convergence of every point in Ω and then check the measure of these points that converge. By contrast, convergence in probability first checks the points that satisfy ∣Xk−X∣>ϵ and then evaluates if the measure will converge to zero as k→∞ .
Convergence in mean:
Definition: {Xk} converges in the r -th mean (or in the Lr norm) to X if
k→∞limE[∣Xk−X∣r]=0.
The most frequently used cases are r=1 and r=2 . It is worth mentioning that convergence in mean is not equivalent to limk→∞E[Xk−X]=0 or limk→∞E[Xk]=E[X] , which indicates that E[Xk] converges but the variance may not.
Convergence in distribution:
Definition: The cumulative distribution function of Xk is defined as P(Xk≤a) where a∈R . Then, {Xk} converges to X in distribution if the cumulative distribution function converges:
k→∞limP(Xk≤a)=P(X≤a),foralla∈R.
A compact expression is
k→∞limP(Ak)=P(A),
where
Ak≐{ω∈Ω:Xk(ω)≤a},A≐{ω∈Ω:X(ω)≤a}.
The relationships between the above types of convergence are given below:
almost sure convergence ⇒ convergence in probability ⇒ convergence in distribution convergence in mean ⇒ convergence in probability ⇒ convergence in distribution
Almost sure convergence and convergence in mean do not imply each other. More information can be found in [102].