Celeb Glow
news | April 09, 2026

Intuition on Wald's equation without using the optional stopping theorem.

$\begingroup$

The Wald's equation even at its simplest form as stated below simplifies many problems of calculating expectation.

Wald's Equation: Let $(X_n)_{n\in\mathbb{N}}$ be a sequence of real-valued, independent and identically distributed random variables and let $N$ be a nonnegative integer-value random variable that is independent of the sequence $(X_n)_{n\in\mathbb{N}}$. Suppose that $N$ and the $X_n$ have finite expectations. Then $$ \operatorname{E}[X_1+\dots+X_N]=\operatorname{E}[N] \cdot\operatorname{E}[X_n]\quad \forall n\in\mathbb{N}\,. $$

I am looking for an intuitive explanation of Wald's equation without using the optional stopping theorem. I'm not interested in explanations for the error $ \operatorname{E}[X_1+\dots+X_N]=N\cdot \operatorname{E}[X_1] $ or explanations to discuss only the hypotheses.

We could have, for exemple, a function $\varphi$ of two or more variables such that $\operatorname{E}[X_1+\dots+X_N]=\varphi\big(\operatorname{E}[N]\, ,\,\operatorname{E}[X_n]\big), \quad\forall n\in\mathbb{N}\,$. The question then becomes for what reason $\varphi(x,y)$ equals $x\cdot y$?

More generally we could have two linear functional $F : L^1(\Omega,\mathcal{A},P)\to \mathbb{R}$ and $G : L^1(\Omega,\mathcal{A},P)\to \mathbb{R}$ such that $\operatorname{E}[X_1+\dots+X_N]=\varphi\big(\operatorname{F}[N]\, ,\,\operatorname{G}[X_n]\big), \quad\forall n\in\mathbb{N}\,.$ So the question would be for what reason $\operatorname{F}=\operatorname{G}=\operatorname{E}$ and $\varphi(x,y)$ equals $x\cdot y$?

The interest is on the intuition of the equation. An answer based on a good example will be very welcome.

Thanks in advance.

$\endgroup$ 4

2 Answers

$\begingroup$

One simple intuitive explanation is that $$ \mathbb{E}[X_1+\cdots+X_N | N=n] = n\mathbb{E}[X_1], $$ so it follows that $$ \mathbb{E}[X_1+\cdots+X_N] = \mathbb{E}[\mathbb{E}[X_1+\cdots+X_N|N]] = \mathbb{E}[N \mathbb{E}[X_1]] = \mathbb{E}[N] \mathbb{E}[X_1]. $$ This works because they are independent, so you just take $N$ copies of the same r.v. The identity is not quite as trivial when $N$ is a stopping time.

$\endgroup$ 2 $\begingroup$

Here is a martingale approach to give some intuition, not to prove the theorem:

Consider a Random Walk $$S_n = S_0 + X_1 + ... + X_n$$ where $X_i$ are iid with mean $E(X_i) = \mu$. We can easily prove for a vector A of previous observations, that $$M_n = S_n - n\mu$$ is a martingale, namely: $$E[M_{n+1} - M_{n} | A] = E[ S_{n+1} - (n+1)\mu - S_n + n\mu |A]$$ $$ = E[X_{n+1} - \mu |A] = E[X_{n+1}] - E\mu = \mu - \mu = 0$$ Then, intuitively $$ S_n - S_0 = X_1 + ... + X_n $$ Take expectations in both sides and since $X_i$s are independent we get: $$E[S_n] - E[S_0] = E[X_1] + ... + E[X_n] $$ or $$ E[S_n - S_0] = \mu + ... + \mu = n \mu $$ However, Wald assumes time n to be random variable by itself, and since T is stopping time in the theorem with $P(T < \infty) = 1$ and $ET < \infty$, we care about the minimum between T and n, because we have to stop the walk at time T. Thus as $n \to \infty$ $$ \min(T,n) \to T $$ and finally with $n \to \infty$ $$ E[S_{min(T,n)} - S_0] = \mu E(min(T,n))$$ becomes $$ E[S_T - S_0] = \mu ET$$

There is more theory to consider in the above, like that we have an increasing limit and thus we can pass in the expectation so that $E[S_T - S_{min(T,n)}] \le \sum_{m = n}^T |X_m| \cdot P(T > n) \to 0$ etc. but all that are part of a rigorous proof. I hope that helps :))

$\endgroup$

Your Answer

Sign up or log in

Sign up using Google Sign up using Facebook Sign up using Email and Password

Post as a guest

By clicking “Post Your Answer”, you agree to our terms of service, privacy policy and cookie policy