# Variance inequalities

Concentration inequalities

Bounds on the variance of random variables and functions thereof. Currently, mostly taken from Boucheron, Lugosi, and Massart (2013).

## Efron-Stein inequality

Theorem Let $$X_1,...X_N$$ be independent random variables, and $$Z=f(X)$$ a square-integrable function of $$X=(X_1,...,X_N)$$. Then

$Var(Z)\leq\sum_{i=1}^N \mathbb{E}\left[\left(Z-\mathbb{E}^{(i)}[Z]\right)^2\right]=:v,$

where

$\mathbb{E}^{(i)}[Z]=\int_\mathcal{X}f(X_1,...,X_{i-1},x_i,X_{i+1},...,X_n)d\mu_i(x_i).$

Let $$X_i',...,X_n'$$ be independent copies of $$X_1,...,X_n$$ and define

$Z_i'=f(X_1,...,X_{i-1},X_i',X_{i+1},...,X_n).$

Then

$v=\frac{1}{2}\sum_{i=1}^n \mathbb{E}\left[(Z-Z_i')^2\right]=\sum_{i=1}^n \mathbb{E}\left[(Z-Z_i')_+^2\right]=\sum_{i=1}^n \mathbb{E}\left[(Z-Z_i')_-^2\right],$

where $$x_+=\max(x,0)$$ and $$x_-=\min(x,0)$$. Also,

$v=\inf_{Z_i}\sum_{i=1}^n\mathbb{E}\left[(Z-Z_i)^2\right]$

with the infimum taken over the class of all $$X^{(i)}$$-measurable and square-integrable variables $$Z_i;i=1,...,n,$$

Proof Let $\Delta_i=\mathbb{E}_i\left[Z-\mathbb{E}^{(i)}[Z]\right],$

where $\mathbb{E}_i[Z]=\int_{\mathcal{X}^{n-i}}f(X_1,...,X_i,x_{i+1},...,x_n)d\mu_{i+1}(x_{i+1})\cdots d\mu_n(x_n).$

By convention,

$\mathbb{E}_{0}[Z]=\mathbb{E}[Z],$

and, clearly,

$\mathbb{E}_i\left[Z-\mathbb{E}^{(i)}[Z]\right]=\mathbb{E}_i[Z]-\mathbb{E}_{i-1}Z.$

We have

\begin{aligned} \sum_{i=1}^n\Delta_i &= \sum_{i=1}^n \mathbb{E}_i \left[Z-\mathbb{E}^{(i)}[Z]\right] \\ &=\sum_{i=1}^n \mathbb{E}_i[Z] - \mathbb{E}_{i-1}[Z] \\ &= \mathbb{E}_n[Z] - \mathbb{E}_{n-1}[Z] + \mathbb{E}_{n-1}[Z] - \cdots - \mathbb{E}_{1}[Z] + \mathbb{E}_{0}[Z]\\ &=Z - \mathbb{E}[Z], \end{aligned}

which implies

$Var(Z)=\mathbb{E}\left[(Z - \mathbb{E}[Z])^2\right]=\mathbb{E}\left[\left(\sum_{i=1}^n\Delta_i\right)^2\right].$

Next, notice that

\begin{aligned} \mathbb{E}\left[\left(\sum_{i=1}^n\Delta_i\right)^2\right]&=\mathbb{E}\left[\sum_{i=1}^n\Delta_i^2\right]+\mathbb{E}\left[2\sum_{i=1}^n\Delta_i\Delta_j\right]\\ &=\sum_{i=1}^n\mathbb{E}\left[\Delta_i^2\right]+2\sum_{i<j}\mathbb{E}\left[\Delta_i\Delta_j\right], \end{aligned}

and, for $$j>i$$,

\begin{aligned}\mathbb{E}_i[\Delta_j\Delta_i]&=\Delta_i\mathbb{E}_i[\Delta_j]\\ &=\Delta_i\mathbb{E}_i[\mathbb{E}_j [Z] - \mathbb{E}_{j-1}[Z]]\\ &=\Delta_i(\mathbb{E}_i[Z] - \mathbb{E}_{i}[Z])\\ &=0\end{aligned}.

Therefore,

$\mathbb{E}[\Delta_j\Delta_i]=\int \mathbb{E}_i[\Delta_j\Delta_i]d\mu(x_1)\cdots d\mu(x_{i-1})=\int 0 d\mu(x_1)\cdots d\mu(x_{i-1})=0,$

whenever $$j>i$$.

Then, $Var(Z) = \mathbb{E}\left[\left(\sum_{i=1}^n\Delta_i\right)^2\right] =\sum_{i=1}^n\mathbb{E}\left[\Delta_i^2\right].$

By Jensen’s inequality, we have

\begin{aligned} & \Delta_i=\mathbb{E}_i\left[Z-\mathbb{E}^{(i)}[Z]\right]\\ \Leftrightarrow &\Delta_i^2\leq\mathbb{E}_i\left[(Z-\mathbb{E}^{(i)}[Z])^2\right], \end{aligned}

and thus

$Var(Z) \leq \sum_{i=1}^N \mathbb{E}_i\left[(Z-\mathbb{E}^{(i)}[Z])^2\right].$

Next, notice that, if $$Y$$ is an i.i.d. copy of $$X$$, we have

\begin{aligned} \frac{1}{2}E[(X-Y)^2]&=\frac{1}{2}E[X^2-2XY+Y^2]\\ &=\frac{1}{2}(E[X^2]-2E[X]E[Y]+E[Y^2])\\ &=\frac{1}{2}(2E[X^2]-2E[X]^2))\\ &=E[X^2]-E[X]^2\\ &=Var(X) \end{aligned}.

Also, in

$\frac{1}{2}\mathbb{E}^{(i)}[(Z-Z_i')^2],$

$$Z_i'$$ acts as and i.i.d. copy of $$Z$$ under $$E^{(i)}$$ (verify by writing out the expectation as an integral and the $$Z$$’s as functions of $$X_1,...,X_n$$ and $$X_i'$$.

Therefore,

$Var^{(i)}(Z)=\frac{1}{2}\mathbb{E}^{(i)}[(Z-Z_i')^2].$

We have

$(Z-Z_i')^2=(Z-Z_i')_+^2+(Z-Z_i')_-^2,$

and

$\mathbb{E}[(Z-Z_i')^2]=\mathbb{E}[(Z-Z_i')_+^2]+\mathbb{E}[(Z-Z_i')_-^2],$

Since $$Z$$ and $$Z_i'$$ have the same distribution, i.e. $$Z\stackrel{d}{=}Z_i'$$, we can swap them to get

\begin{aligned} \mathbb{E}[(Z-Z_i')_-^2]&=\mathbb{E}[(Z_i'-Z)_-^2]\\ &=\mathbb{E}[(-1)^2\cdot\min(Z_i'-Z,0)^2]\\ &=\mathbb{E}[\max(Z-Z_i',0)^2]\\ &=\mathbb{E}[(Z-Z_i')_+^2] \end{aligned}

and

$\mathbb{E}[(Z-Z_i')^2])=2\mathbb{E}[(Z-Z_i')_+^2].$

This proves that

$Var^{(i)}(Z)=\frac{1}{2}\mathbb{E}^{(i)}[(Z-Z_i')^2]=\mathbb{E}[(Z-Z_i')_+^2].$

For the minimum, we can proceed similarly.

Finally, we use

$Var(X)=\inf_{a\in\mathbb{R}}\mathbb{E}[(X-a)^2]$

(with a minimum at $$a=\mathbb{E}[X]$$) to show that

$Var^{(i)}(Z)=\inf_{Z_i}\mathbb{E}^{(i)}[(Z-Z_i)^2]$

with a minimum at $$Z_i=\mathbb{E}^{(i)}Z$$$$\square$$.

## Bounded differences property

A function has the bounded differences property if, for some non-negative constants $$c_1,...,c_n$$,

$\sup _{\substack{x_1, \ldots, x_n \\ x_i^{\prime} \in \mathcal{X}}}\left|f\left(x_1, \ldots, x_n\right)-f\left(x_1, \ldots, x_{i-1}, x_i^{\prime}, x_{i+1}, \ldots, x_n\right)\right| \leq c_i, 1 \leq i \leq n$

Corollary If $$f$$ has the bounded differences property, then $\operatorname{Var}(Z) \leq \frac{1}{4} \sum_{i=1}^n c_i^2$

Proof Use the Efron-Stein inequality from earlier, namely $\operatorname{Var}(Z) \leq \inf _{Z_i} \sum_{i=1}^n \mathbb{E}\left[\left(Z-Z_i\right)^2\right].$ Since we can choose arbirary $$Z_i$$ from the class of all $$X^{(i)}$$-measurable, square-integrable $$Z_i$$, we can use \begin{aligned} & Z_i=\frac{1}{2}\left(\sup _{x_i^{\prime} \in \mathcal{X}} f\left(X_1, \ldots, X_{i-1}, x_i^{\prime}, X_{i+1}, \ldots, X_n\right)\right. \\ &\left.+\inf _{x_i^{\prime} \in \mathcal{X}} f\left(X_1, \ldots, X_{i-1}, x_i^{\prime}, X_{i+1}, \ldots, X_n\right)\right), \end{aligned} to exploit the bounded differences property of $$f$$. Applying the Efron-Stein inequality then yields $\left(Z-Z_i\right)^2 \leq \frac{c_i^2}{4},$ and the proposition follows by applying summation$$\square$$.

## Self-Bounding functions

A non-negative function $$f:\mathcal{X}^n\rightarrow [0,\infty)$$ has the self-bounding property if there exist functions $$f_i:\mathcal{X}^n\rightarrow [0,\infty)$$ such that for all $$x_1,..,x_n\in\mathcal{X}$$ and all $$i=1,..,n$$,

$0 \leq f\left(x_1, \ldots, x_n\right)-f_i\left(x_1, \ldots, x_{i-1}, x_{i+1}, \ldots, x_n\right) \leq 1,$ and $\sum_{i=1}^n\left(f\left(x_1, \ldots, x_n\right)-f_i\left(x_1, \ldots, x_{i-1}, x_{i+1}, \ldots, x_n\right)\right) \leq f\left(x_1, \ldots, x_n\right)$ hold.

Then: $\sum_{i=1}^n\left(f\left(x_1, \ldots, x_n\right)-f_i\left(x_1, \ldots, x_{i-1}, x_{i+1}, \ldots, x_n\right)\right)^2 \leq f\left(x_1, \ldots, x_n\right)$ since the difference is between $$0$$ and $$1$$, and therefore,

$\left(f\left(x_1, \ldots, x_n\right)-f_i\left(x_1, \ldots, x_{i-1}, x_{i+1}, \ldots, x_n\right)\right)\leq \left(f\left(x_1, \ldots, x_n\right)-f_i\left(x_1, \ldots, x_{i-1}, x_{i+1}, \ldots, x_n\right)\right)^2$

for each $$i$$.

Corollary Then, for $$Z=f(X_1,...,X_n$$) and $$f$$ self-bounding:

$\operatorname{Var}(Z) \leq \mathbb{E}[Z]$

Proof We use $$Z_i=f_i\left(X_1, \ldots, X_{i-1}, X_{i+1}, \ldots, X_n\right)$$ and apply Efron-Stein to get

\begin{aligned} Var(Z)&\leq \inf _{Z_i} \sum_{i=1}^n \mathbb{E}\left[\left(Z-Z_i\right)^2\right]\\ &\leq \sum_{i=1}^n E\left[\left(f\left(x_1, \ldots, x_n\right)-f_i\left(x_1, \ldots, x_{i-1}, x_{i+1}, \ldots, x_n\right)\right)^2\right]\\ &\leq \mathbb{E}\left[f\left(x_1, \ldots, x_n\right)\right]\\ &= \mathbb{E}[Z]\square \end{aligned}

## Poincaré inequalities

### Convex Poincaré inequality

Presume that $$f:\left[0,1\right]^n\rightarrow \mathbb{R}$$ is separately convex - for any $$i=1,...,n$$ and fixed $$x_1,...,x_{i-1},x_{i+1},...,x_n$$, $$f$$ is convex in its $$i$$-th variable.

Theorem Let $$X_1,...X_n\in[0,1]$$ be independent random variables and $$f:[0,1]^n\rightarrow \mathbb{R}$$ be separately convex. Also, presume existence of the partial derivatives of $$f$$.

Then,

$\operatorname{Var}(f(X)) \leq \mathbb{E}\left[\|\nabla f(X)\|^2\right].$

Proof Our goal is to bound $$\sum_{i=1}^n(Z-Z_i)^2$$, with $$Z_i=\inf_{x_i'} f(X_1,...,x_i',...,X_n)$$. Let $$X_i'$$ denote the $$x_i'$$ where the minimum is achieved. Write $$\bar{X}^{(i)}=(X_1,...,X_{i-1},X_i',X_{i+1},...,X_n)$$ - then

\begin{aligned} \sum_{i=1}^n\left(Z-Z_i\right)^2 & =\sum_{i=1}^n\left(f(X)-f\left(\bar{X}^{(i)}\right)^2\right) \\ & \leq \sum_{i=1}^n\left(\frac{\partial f}{\partial x_i}(X)\right)^2\left(X_i-X_i^{\prime}\right)^2 \\ & \leq \sum_{i=1}^n\left(\frac{\partial f}{\partial x_i}(X)\right)^2 \\ & =\|\nabla f(X)\|^2 . \end{aligned}

The first inequality follows by separate convexity, as for a convex function it holds that

$f(x)\geq f(y)+f'(y)(x-y) \leftrightarrow (f(x)- f(y))^2 \geq f'(y)^2(x-y)^2.$

The second inequality is due to the fact that all $$X_i$$, $$X_i'$$ are in $$[0,1]$$, thus the corresponding squared differences are all in $$[0,1]$$, too.

Finally, applying the mean operator on boths sides and applying Efron-Stein to the left side yields the result$$\square$$

### Gaussian Poincaré Inequality

Theorem Let $$X=(X_1,...,X_n)$$ be a vector of i.i.d. standard Gaussians. Let $$f:\mathbb{R}^n\rightarrow \mathbb{R}$$ be continuously differentiable. Then

$\operatorname{Var}(f(X)) \leq \mathbb{E}\left[\|\nabla f(X)\|^2\right]$

Proof Assume that $$\mathbb{E}\left[\|\nabla f(X)\|^2\right]<\infty$$, otherwise the result is trivial.

Given that $\|x\|^2=\sum_{i=1}^n x^2$, it suffices to prove the inequality for $$n=1$$:

$\operatorname{Var}(f(X)) \leq E\left[f^{\prime}(X)^2\right]$

It suffices to prove this for $$f$$ with compact support and being twice continuously differentiable.

Let $$\epsilon_1,...,\epsilon_n$$ be independent Rademacher variables and

$S_n=n^{-1 / 2} \sum_{j=1}^n \varepsilon_j.$

Then, we have

\begin{aligned} \mathbb{E}^{(i)}\left[f\left(S_n\right)\right] & =\frac{1}{2} f\left(S_n-\frac{\varepsilon_i}{\sqrt{n}}+\frac{1}{\sqrt{n}}\right)+\frac{1}{2} f\left(S_n-\frac{\varepsilon_i}{\sqrt{n}}-\frac{1}{\sqrt{n}}\right) \\ & =\frac{1}{2} f\left(S_n+\frac{1-\varepsilon_i}{\sqrt{n}}\right)+\frac{1}{2} f\left(S_n-\frac{1+\varepsilon_i}{\sqrt{n}}\right) \end{aligned}

and \begin{aligned} Var^{(i)}\left(f\left(S_n\right)\right)&= \frac{1}{2}\left[f\left(S_n+\frac{1-\varepsilon_i}{\sqrt{n}}\right)-\frac{1}{2} f\left(S_n+\frac{1-r_i}{\sqrt{n}}\right)-\frac{1}{2} f\left(S_n-\frac{1-\varepsilon_i}{\sqrt{n}}\right)\right]^2 \\ &\quad+\frac{1}{2}\left[f\left(S_n-\frac{1+c_i}{\sqrt{n}}\right)-\frac{1}{2} f\left(S_n+\frac{1-\varepsilon_i}{\sqrt{n}}\right)-\frac{1}{2} f\left(S_n-\frac{1+\varepsilon_i}{\sqrt{n}}\right)\right]^2 \\ &=\frac{1}{2} {\left[\frac{1}{2} f\left(S_n+\frac{1-\varepsilon_i}{\sqrt{n}}\right)-\frac{1}{2} f\left(S_n-\frac{1+\varepsilon_i}{\sqrt{n}}\right)\right]^2 } \\ &\quad+\frac{1}{2}\left[\frac{1}{2} f\left(S_n-\frac{1+\varepsilon_i}{\sqrt{n}}\right)-\frac{1}{2} f\left(S_n+\frac{1-\varepsilon_i}{\sqrt{n}}\right)\right]^2 \\ &=\frac{1}{2}\left[\frac{1}{4} f\left(S_n+\frac{1-\varepsilon_i}{\sqrt{n}}\right)^2-2 \cdot \frac{1}{2} f\left(S_n+\frac{1-r_i}{\sqrt{n}}\right) \frac{1}{2} f\left(S_n-\frac{1+\varepsilon_i}{\sqrt{n}}\right)+\frac{1}{4} f\left(S_n-\frac{1+\varepsilon_i}{\sqrt{n}}\right)^2\right. \\ &\left.\quad+\frac{1}{4} f\left(S_n-\frac{1+\varepsilon_i}{\sqrt{n}}\right)^2-2 \cdot \frac{1}{2} f\left(S_n+\frac{1-r_i}{\sqrt{n}}\right) \frac{1}{2} f\left(S_n-\frac{1+\varepsilon_i}{\sqrt{n}}\right)+\frac{1}{n} f\left(S_n+\frac{1-\varepsilon_i}{\sqrt{n}}\right)^2\right] \\ &=\frac{1}{2}\left[\frac{1}{2} f\left(S_n+\frac{1-\varepsilon_i}{\sqrt{n}}\right)^2-2 \cdot \frac{1}{2} f\left(S_n+\frac{1-r_i}{\sqrt{n}}\right) f\left(S_n-\frac{1+\varepsilon_i}{\sqrt{n}}\right)+\frac{1}{2} f\left(S_n-\frac{1+\varepsilon_i}{\sqrt{n}}\right)^2\right] \\ &= \frac{1}{4}\left(f\left(S_n+\frac{1-\varepsilon_i}{\sqrt{n}}\right)-f\left(S_n-\frac{1+\varepsilon_i}{\sqrt{n}}\right)\right)^2. \end{aligned}

Since

$\sum_{i=1}^n E\left[\left(Z-E^{(i)} Z\right)^2\right]=\sum_{i=1}^n \boldsymbol{E}\left[\operatorname{Var}^{(i)}(Z)\right],$

we can apply Efron-Stein on $$Var^{(i)}\left(f\left(S_n\right)\right)$$:

$\operatorname{Var}\left(f\left(S_n\right)\right) \leq \frac{1}{4} \sum_{i=1}^n \boldsymbol{E}\left[\left(f\left(S_n+\frac{1-\varepsilon_i}{\sqrt{n}}\right)-f\left(S_n-\frac{1+\varepsilon_i}{\sqrt{n}}\right)\right)^2\right]$

The Rademacher sum converges in distribution to a standard Gaussian, hence $$Var(f(S_n))$$ converges to $$Var(f(X))$$.

Now, apply a Taylor approximation:

$\left|f\left(S_n+\frac{1-\varepsilon_i}{\sqrt{n}}\right)-f\left(S_n-\frac{1+\varepsilon_i}{\sqrt{n}}\right)\right| \leq \frac{2}{\sqrt{n}}\left|f^{\prime}\left(S_n\right)\right|+\frac{2 K}{n}$

where $$K$$ denotes the supremum of the absolute value of $$f''$$. This yields

\begin{aligned} & \frac{n}{4}\left(f\left(S_n+\frac{1-\varepsilon_i}{\sqrt{n}}\right)-f\left(S_n-\frac{1+\varepsilon_i}{\sqrt{n}}\right)\right)^2 \\ & \leq f^{\prime}\left(S_n\right)^2+\frac{2 K}{\sqrt{n}}\left|f^{\prime}\left(S_n\right)\right|+\frac{K^2}{n} . \end{aligned}

Finally, for the limit of the upper bound (when $$f(S_n)$$ converges to $$f(X)$$), we obtain

$\limsup _{n \rightarrow \infty} \frac{1}{4} \sum_{i=1}^n E\left[\left(f\left(S_n+\frac{1-\varepsilon_i}{\sqrt{n}}\right)-f\left(S_n-\frac{1+\varepsilon_i}{\sqrt{n}}\right)\right)^2\right]=E\left[f^{\prime}(X)^2\right].$

For Lipschitz-functions with Lipschitz constant $$1$$, this implies (via the Mean value theorem):

$Var(f(X))\leq 1$

## References

Boucheron, Stéphane, Gábor Lugosi, and Pascal Massart. 2013. Concentration Inequalities: A Nonasymptotic Theory of Independence. Oxford university press.