How to find Var($\bar{X}-\bar{Y}$) , the variance of the difference between the sample means?
So I tried to do this my own way but I'm not sure if it's correct. I used the equation for variance to get this answer, but I'm not sure if it matches up with what the answer is. Also to be honest, I'm not sure why I'm dividing by m+n? I just sort of guessed but I don't really get why. I thought I'd have to multiply by m and n, not divide. I'm guessing that's because you take out the constant when you calculate variance?
1 Answer
$\begingroup$I hope a few extra sentences can clear up some of your confusion.
In the notation of the printed problem, the variances of the sample means are $V(\bar X) = \sigma_1^2/n$ and $V(\bar Y) = \sigma_2^2/m$.
Without knowing the context of the chapter you took this from, I'd say it is reasonable to assume these are two independent samples. That implies that $\bar X$ and $\bar Y$ are independent random variables.
In general, if $X$ and $Y$ are independent random variables, then $Var(aX + bY) = a^2Var(X) + b^2Var(y),$ so that $Var(X - Y) = Var(X)+Var(Y),$ letting $a = 1$ and $b = -1.$
Then
$$Var(\bar X - \bar Y) = \frac{\sigma_1^2}{n} + \frac{\sigma_2^2}{m},$$
Note: The covariance plays a role when the two sample means are not independent. That is the reason for the last sentence in part b.
I'm guessing that this exercise is to get you ready to find a confidence interval or do a test on the difference between two population means. The estimate of $\mu_1 - \mu_2$ is $\bar X - \bar Y.$ Part of the rationale for that statement is that $E(\bar X - \bar Y) = \mu_1 - \mu_2.$ The variance of this estimate is what we just derived in the displayed equation.
A little more background--because you wondered about $n$'s in denominators.
$$E(\bar X) = E\left(\frac{X_1 + X_2 + \cdots + X_n}{n}\right) = \frac{1}{n}E(X_1 + X_2 + \cdots + X_n) = \frac{1}{n}[E(X_1)+E(X_2)+\cdots + E(X_n)] = \frac{1}{n}[\mu_1 + \mu_1 + \cdots + \mu_1] \\ = \frac{1}{n}(n\mu_1) = \mu_1.$$
Also, because $Var(aX) = a^2 Var(X),$ and taking $a = 1/n,$ we have
$$Var(\bar X) = Var\left(\frac{X_1 + X_2 + \cdots + X_n}{n}\right) = \left(\frac{1}{n}\right)^2 Var(X_1 + X_2 + \cdots + X_n) = \frac{1}{n^2}[Var(X_1)+Var(X_2)+\cdots + Var(X_n)] = \frac{1}{n^2}[\sigma_1^2 + \sigma_1^2 + \cdots + \sigma_1^2] \\ = \frac{1}{n^2}(n\sigma_1^2) = \sigma_1^2/n.$$
This says that $\bar X$ is a less-variable (more stable) random variable than any one of the $X_i$'s. And the variance gets smaller as the sample size gets larger. That idea is really important in inferential statistics. More information (carefully collected) is better than less.
$\endgroup$ 1