Mean and Variance

Mean and variance are probably the most common properties of random variables. A big chunk of theoretical results take advantage of the immediate as well as lesser known properties of these two quantities. Nevertheless, relying completely on these two concepts or their generalized form of statistical moments is unsafe, as there are families of random variables that don't have a variance or mean. Well known examples are the Student T Distribution or the Cauchy Distribution.

Mean and variance of a random variable $X$ with density/mass function $p(x)$ and support $\mathcal{X}$:



For discrete random variables where we have a mass function instead of a density function, we would replace the integrals by sums. Another fairly helpful relation between mean and variance is the following:


Properties of mean and variance

Mean and variance are particularly interesting because we can easily derive them for linear transformations of random variables as well. For univariate random variables $X$ and $Y$, we have:

  • $\mathbb{E}\left[a\cdot X\right]=a\cdot\mathbb{E}\left[X\right]$
  • $\mathbb{E}\left[a+ X\right]=a+\mathbb{E}\left[X\right]$
  • $\mathbb{E}\left[X+Y\right]=\mathbb{E}\left[X\right]+\mathbb{E}\left[Y\right]$
  • $\mathbb{E}\left[X^2\right]=\mathbb{V}ar\left[X\right]+\mathbb{E}\left[X\right]^2$
  • $\mathbb{V}ar\left[a\cdot X\right]=a^2\cdot\mathbb{V}ar\left[X\right]$
  • $\mathbb{V}ar\left[a+ X\right]=\mathbb{V}ar\left[X\right]$
  • $\mathbb{V}ar\left[X+Y\right]=\mathbb{V}ar\left[X\right]+\mathbb{V}ar\left[Y\right]+2Cov(X,Y)$
  • $\mathbb{V}ar\left[X^2\right]=\mathbb{E}\left[X^4\right]-\mathbb{E}\left[X^2\right]^2=\mathbb{E}\left[X^4\right]-\mathbb{V}ar\left[X\right]^2-2\mathbb{V}ar\left[X\right]\mathbb{E}\left[X\right]^2-\mathbb{E}\left[X\right]^4$

for scalar $a$.

Some results for multivariate distributions:

  • $\mathbb{E}\left[AX\right]=A\mathbb{E}\left[X\right]$

  • $\mathbb{V}ar\left[AX\right]=A\mathbb{V}ar\left[X\right]A$

for some Matrix $A$.

Empirical estimators given samples $x_1,...,x_n$ drawn i.i.d. from $X$:

Since we don't know the data generating process in nature, we need to estimate parameters of interest like mean and variance. Here, we only want to look at the well known estimators.



Keep in mind that these formulas aim to estimate the true population mean and variance. This begs two important caveats:

  1. Another estimator might be 'better' in some sense of goodness. A famous example where the standard mean-estimator underperforms is Stein's paradoxon - some information can be found here.
  2. Mean and variance of the underlying distribution might not even exist. Even more dangerous - since our sample will always have finite mean and variance, we will never know if the underlying distribution has finite equivalents. If the distribution actually has an existing mean but not variance, the estimator of the mean would also be bogus, as the variance of the above estimator of the mean depends on the population variance as well

Often, we can get away with ignoring these difficulties. In fact, a whole lot of standard Machine Learning theory depends on the existence of mean and variance and the field still unarguably yields exciting results. Nevertheless, I recommend some caution here as there is no general guarantee that a real-world problem doesn't behave in a highly nasty way. The distribution of stock-market returns might be a real-world example of non-existent variance.

Finally, this begs the question of how to mitigate this problem. One option would be to consider stronger emphasis on quantiles whose main sample-based estimator at least does not depend on statistical moments. We will get to quantiles later on.