Variance inequalities

Concentration inequalities

Bounds on the variance of random variables and functions thereof. Currently, mostly taken from Boucheron, Lugosi, and Massart (2013).

Efron-Stein inequality

Theorem Let \(X_1,...X_N\) be independent random variables, and \(Z=f(X)\) a square-integrable function of \(X=(X_1,...,X_N)\). Then

\[Var(Z)\leq\sum_{i=1}^N \mathbb{E}\left[\left(Z-\mathbb{E}^{(i)}[Z]\right)^2\right]=:v,\]

where

\[\mathbb{E}^{(i)}[Z]=\int_\mathcal{X}f(X_1,...,X_{i-1},x_i,X_{i+1},...,X_n)d\mu_i(x_i).\]

Let \(X_i',...,X_n'\) be independent copies of \(X_1,...,X_n\) and define

\[Z_i'=f(X_1,...,X_{i-1},X_i',X_{i+1},...,X_n).\]

Then

\[v=\frac{1}{2}\sum_{i=1}^n \mathbb{E}\left[(Z-Z_i')^2\right]=\sum_{i=1}^n \mathbb{E}\left[(Z-Z_i')_+^2\right]=\sum_{i=1}^n \mathbb{E}\left[(Z-Z_i')_-^2\right],\]

where \(x_+=\max(x,0)\) and \(x_-=\min(x,0)\). Also,

\[v=\inf_{Z_i}\sum_{i=1}^n\mathbb{E}\left[(Z-Z_i)^2\right]\]

with the infimum taken over the class of all \(X^{(i)}\)-measurable and square-integrable variables \(Z_i;i=1,...,n,\)

Proof Let \[\Delta_i=\mathbb{E}_i\left[Z-\mathbb{E}^{(i)}[Z]\right],\]

where \[\mathbb{E}_i[Z]=\int_{\mathcal{X}^{n-i}}f(X_1,...,X_i,x_{i+1},...,x_n)d\mu_{i+1}(x_{i+1})\cdots d\mu_n(x_n).\]

By convention,

\[\mathbb{E}_{0}[Z]=\mathbb{E}[Z],\]

and, clearly,

\[\mathbb{E}_i\left[Z-\mathbb{E}^{(i)}[Z]\right]=\mathbb{E}_i[Z]-\mathbb{E}_{i-1}Z.\]

We have

\[ \begin{aligned} \sum_{i=1}^n\Delta_i &= \sum_{i=1}^n \mathbb{E}_i \left[Z-\mathbb{E}^{(i)}[Z]\right] \\ &=\sum_{i=1}^n \mathbb{E}_i[Z] - \mathbb{E}_{i-1}[Z] \\ &= \mathbb{E}_n[Z] - \mathbb{E}_{n-1}[Z] + \mathbb{E}_{n-1}[Z] - \cdots - \mathbb{E}_{1}[Z] + \mathbb{E}_{0}[Z]\\ &=Z - \mathbb{E}[Z], \end{aligned} \]

which implies

\[Var(Z)=\mathbb{E}\left[(Z - \mathbb{E}[Z])^2\right]=\mathbb{E}\left[\left(\sum_{i=1}^n\Delta_i\right)^2\right].\]

Next, notice that

\[ \begin{aligned} \mathbb{E}\left[\left(\sum_{i=1}^n\Delta_i\right)^2\right]&=\mathbb{E}\left[\sum_{i=1}^n\Delta_i^2\right]+\mathbb{E}\left[2\sum_{i=1}^n\Delta_i\Delta_j\right]\\ &=\sum_{i=1}^n\mathbb{E}\left[\Delta_i^2\right]+2\sum_{i<j}\mathbb{E}\left[\Delta_i\Delta_j\right], \end{aligned} \]

and, for \(j>i\),

\[\begin{aligned}\mathbb{E}_i[\Delta_j\Delta_i]&=\Delta_i\mathbb{E}_i[\Delta_j]\\ &=\Delta_i\mathbb{E}_i[\mathbb{E}_j [Z] - \mathbb{E}_{j-1}[Z]]\\ &=\Delta_i(\mathbb{E}_i[Z] - \mathbb{E}_{i}[Z])\\ &=0\end{aligned}.\]

Therefore,

\[\mathbb{E}[\Delta_j\Delta_i]=\int \mathbb{E}_i[\Delta_j\Delta_i]d\mu(x_1)\cdots d\mu(x_{i-1})=\int 0 d\mu(x_1)\cdots d\mu(x_{i-1})=0,\]

whenever \(j>i\).

Then, \[Var(Z) = \mathbb{E}\left[\left(\sum_{i=1}^n\Delta_i\right)^2\right] =\sum_{i=1}^n\mathbb{E}\left[\Delta_i^2\right].\]

By Jensen’s inequality, we have

\[\begin{aligned} & \Delta_i=\mathbb{E}_i\left[Z-\mathbb{E}^{(i)}[Z]\right]\\ \Leftrightarrow &\Delta_i^2\leq\mathbb{E}_i\left[(Z-\mathbb{E}^{(i)}[Z])^2\right], \end{aligned}\]

and thus

\[Var(Z) \leq \sum_{i=1}^N \mathbb{E}_i\left[(Z-\mathbb{E}^{(i)}[Z])^2\right].\]

Next, notice that, if \(Y\) is an i.i.d. copy of \(X\), we have

\[\begin{aligned} \frac{1}{2}E[(X-Y)^2]&=\frac{1}{2}E[X^2-2XY+Y^2]\\ &=\frac{1}{2}(E[X^2]-2E[X]E[Y]+E[Y^2])\\ &=\frac{1}{2}(2E[X^2]-2E[X]^2))\\ &=E[X^2]-E[X]^2\\ &=Var(X) \end{aligned}.\]

Also, in

\[\frac{1}{2}\mathbb{E}^{(i)}[(Z-Z_i')^2],\]

\(Z_i'\) acts as and i.i.d. copy of \(Z\) under \(E^{(i)}\) (verify by writing out the expectation as an integral and the \(Z\)’s as functions of \(X_1,...,X_n\) and \(X_i'\).

Therefore,

\[Var^{(i)}(Z)=\frac{1}{2}\mathbb{E}^{(i)}[(Z-Z_i')^2].\]

We have

\[(Z-Z_i')^2=(Z-Z_i')_+^2+(Z-Z_i')_-^2,\]

and

\[\mathbb{E}[(Z-Z_i')^2]=\mathbb{E}[(Z-Z_i')_+^2]+\mathbb{E}[(Z-Z_i')_-^2],\]

Since \(Z\) and \(Z_i'\) have the same distribution, i.e. \(Z\stackrel{d}{=}Z_i'\), we can swap them to get

\[\begin{aligned} \mathbb{E}[(Z-Z_i')_-^2]&=\mathbb{E}[(Z_i'-Z)_-^2]\\ &=\mathbb{E}[(-1)^2\cdot\min(Z_i'-Z,0)^2]\\ &=\mathbb{E}[\max(Z-Z_i',0)^2]\\ &=\mathbb{E}[(Z-Z_i')_+^2] \end{aligned}\]

and

\[\mathbb{E}[(Z-Z_i')^2])=2\mathbb{E}[(Z-Z_i')_+^2].\]

This proves that

\[Var^{(i)}(Z)=\frac{1}{2}\mathbb{E}^{(i)}[(Z-Z_i')^2]=\mathbb{E}[(Z-Z_i')_+^2].\]

For the minimum, we can proceed similarly.

Finally, we use

\[Var(X)=\inf_{a\in\mathbb{R}}\mathbb{E}[(X-a)^2]\]

(with a minimum at \(a=\mathbb{E}[X]\)) to show that

\[Var^{(i)}(Z)=\inf_{Z_i}\mathbb{E}^{(i)}[(Z-Z_i)^2]\]

with a minimum at \(Z_i=\mathbb{E}^{(i)}Z\)\(\square\).

Bounded differences property

A function has the bounded differences property if, for some non-negative constants \(c_1,...,c_n\),

\[\sup _{\substack{x_1, \ldots, x_n \\ x_i^{\prime} \in \mathcal{X}}}\left|f\left(x_1, \ldots, x_n\right)-f\left(x_1, \ldots, x_{i-1}, x_i^{\prime}, x_{i+1}, \ldots, x_n\right)\right| \leq c_i, 1 \leq i \leq n\]

Corollary If \(f\) has the bounded differences property, then \[ \operatorname{Var}(Z) \leq \frac{1}{4} \sum_{i=1}^n c_i^2 \]

Proof Use the Efron-Stein inequality from earlier, namely \[ \operatorname{Var}(Z) \leq \inf _{Z_i} \sum_{i=1}^n \mathbb{E}\left[\left(Z-Z_i\right)^2\right]. \] Since we can choose arbirary \(Z_i\) from the class of all \(X^{(i)}\)-measurable, square-integrable \(Z_i\), we can use \[ \begin{aligned} & Z_i=\frac{1}{2}\left(\sup _{x_i^{\prime} \in \mathcal{X}} f\left(X_1, \ldots, X_{i-1}, x_i^{\prime}, X_{i+1}, \ldots, X_n\right)\right. \\ &\left.+\inf _{x_i^{\prime} \in \mathcal{X}} f\left(X_1, \ldots, X_{i-1}, x_i^{\prime}, X_{i+1}, \ldots, X_n\right)\right), \end{aligned} \] to exploit the bounded differences property of \(f\). Applying the Efron-Stein inequality then yields \[ \left(Z-Z_i\right)^2 \leq \frac{c_i^2}{4}, \] and the proposition follows by applying summation\(\square\).

Self-Bounding functions

A non-negative function \(f:\mathcal{X}^n\rightarrow [0,\infty)\) has the self-bounding property if there exist functions \(f_i:\mathcal{X}^n\rightarrow [0,\infty)\) such that for all \(x_1,..,x_n\in\mathcal{X}\) and all \(i=1,..,n\),

\[ 0 \leq f\left(x_1, \ldots, x_n\right)-f_i\left(x_1, \ldots, x_{i-1}, x_{i+1}, \ldots, x_n\right) \leq 1, \] and \[ \sum_{i=1}^n\left(f\left(x_1, \ldots, x_n\right)-f_i\left(x_1, \ldots, x_{i-1}, x_{i+1}, \ldots, x_n\right)\right) \leq f\left(x_1, \ldots, x_n\right) \] hold.

Then: \[ \sum_{i=1}^n\left(f\left(x_1, \ldots, x_n\right)-f_i\left(x_1, \ldots, x_{i-1}, x_{i+1}, \ldots, x_n\right)\right)^2 \leq f\left(x_1, \ldots, x_n\right) \] since the difference is between \(0\) and \(1\), and therefore,

\[\left(f\left(x_1, \ldots, x_n\right)-f_i\left(x_1, \ldots, x_{i-1}, x_{i+1}, \ldots, x_n\right)\right)\leq \left(f\left(x_1, \ldots, x_n\right)-f_i\left(x_1, \ldots, x_{i-1}, x_{i+1}, \ldots, x_n\right)\right)^2\]

for each \(i\).

Corollary Then, for \(Z=f(X_1,...,X_n\)) and \(f\) self-bounding:

\[ \operatorname{Var}(Z) \leq \mathbb{E}[Z] \]

Proof We use \(Z_i=f_i\left(X_1, \ldots, X_{i-1}, X_{i+1}, \ldots, X_n\right)\) and apply Efron-Stein to get

\[\begin{aligned} Var(Z)&\leq \inf _{Z_i} \sum_{i=1}^n \mathbb{E}\left[\left(Z-Z_i\right)^2\right]\\ &\leq \sum_{i=1}^n E\left[\left(f\left(x_1, \ldots, x_n\right)-f_i\left(x_1, \ldots, x_{i-1}, x_{i+1}, \ldots, x_n\right)\right)^2\right]\\ &\leq \mathbb{E}\left[f\left(x_1, \ldots, x_n\right)\right]\\ &= \mathbb{E}[Z]\square \end{aligned} \]

Poincaré inequalities

Convex Poincaré inequality

Presume that \(f:\left[0,1\right]^n\rightarrow \mathbb{R}\) is separately convex - for any \(i=1,...,n\) and fixed \(x_1,...,x_{i-1},x_{i+1},...,x_n\), \(f\) is convex in its \(i\)-th variable.

Theorem Let \(X_1,...X_n\in[0,1]\) be independent random variables and \(f:[0,1]^n\rightarrow \mathbb{R}\) be separately convex. Also, presume existence of the partial derivatives of \(f\).

Then,

\[ \operatorname{Var}(f(X)) \leq \mathbb{E}\left[\|\nabla f(X)\|^2\right]. \]

Proof Our goal is to bound \(\sum_{i=1}^n(Z-Z_i)^2\), with \(Z_i=\inf_{x_i'} f(X_1,...,x_i',...,X_n)\). Let \(X_i'\) denote the \(x_i'\) where the minimum is achieved. Write \(\bar{X}^{(i)}=(X_1,...,X_{i-1},X_i',X_{i+1},...,X_n)\) - then

\[ \begin{aligned} \sum_{i=1}^n\left(Z-Z_i\right)^2 & =\sum_{i=1}^n\left(f(X)-f\left(\bar{X}^{(i)}\right)^2\right) \\ & \leq \sum_{i=1}^n\left(\frac{\partial f}{\partial x_i}(X)\right)^2\left(X_i-X_i^{\prime}\right)^2 \\ & \leq \sum_{i=1}^n\left(\frac{\partial f}{\partial x_i}(X)\right)^2 \\ & =\|\nabla f(X)\|^2 . \end{aligned} \]

The first inequality follows by separate convexity, as for a convex function it holds that

\[f(x)\geq f(y)+f'(y)(x-y) \leftrightarrow (f(x)- f(y))^2 \geq f'(y)^2(x-y)^2.\]

The second inequality is due to the fact that all \(X_i\), \(X_i'\) are in \([0,1]\), thus the corresponding squared differences are all in \([0,1]\), too.

Finally, applying the mean operator on boths sides and applying Efron-Stein to the left side yields the result\(\square\)

Gaussian Poincaré Inequality

Theorem Let \(X=(X_1,...,X_n)\) be a vector of i.i.d. standard Gaussians. Let \(f:\mathbb{R}^n\rightarrow \mathbb{R}\) be continuously differentiable. Then

\[ \operatorname{Var}(f(X)) \leq \mathbb{E}\left[\|\nabla f(X)\|^2\right] \]

Proof Assume that \(\mathbb{E}\left[\|\nabla f(X)\|^2\right]<\infty\), otherwise the result is trivial.

Given that \[\|x\|^2=\sum_{i=1}^n x^2\], it suffices to prove the inequality for \(n=1\):

\[ \operatorname{Var}(f(X)) \leq E\left[f^{\prime}(X)^2\right] \]

It suffices to prove this for \(f\) with compact support and being twice continuously differentiable.

Let \(\epsilon_1,...,\epsilon_n\) be independent Rademacher variables and

\[ S_n=n^{-1 / 2} \sum_{j=1}^n \varepsilon_j. \]

Then, we have

\[ \begin{aligned} \mathbb{E}^{(i)}\left[f\left(S_n\right)\right] & =\frac{1}{2} f\left(S_n-\frac{\varepsilon_i}{\sqrt{n}}+\frac{1}{\sqrt{n}}\right)+\frac{1}{2} f\left(S_n-\frac{\varepsilon_i}{\sqrt{n}}-\frac{1}{\sqrt{n}}\right) \\ & =\frac{1}{2} f\left(S_n+\frac{1-\varepsilon_i}{\sqrt{n}}\right)+\frac{1}{2} f\left(S_n-\frac{1+\varepsilon_i}{\sqrt{n}}\right) \end{aligned} \]

and \[ \begin{aligned} Var^{(i)}\left(f\left(S_n\right)\right)&= \frac{1}{2}\left[f\left(S_n+\frac{1-\varepsilon_i}{\sqrt{n}}\right)-\frac{1}{2} f\left(S_n+\frac{1-r_i}{\sqrt{n}}\right)-\frac{1}{2} f\left(S_n-\frac{1-\varepsilon_i}{\sqrt{n}}\right)\right]^2 \\ &\quad+\frac{1}{2}\left[f\left(S_n-\frac{1+c_i}{\sqrt{n}}\right)-\frac{1}{2} f\left(S_n+\frac{1-\varepsilon_i}{\sqrt{n}}\right)-\frac{1}{2} f\left(S_n-\frac{1+\varepsilon_i}{\sqrt{n}}\right)\right]^2 \\ &=\frac{1}{2} {\left[\frac{1}{2} f\left(S_n+\frac{1-\varepsilon_i}{\sqrt{n}}\right)-\frac{1}{2} f\left(S_n-\frac{1+\varepsilon_i}{\sqrt{n}}\right)\right]^2 } \\ &\quad+\frac{1}{2}\left[\frac{1}{2} f\left(S_n-\frac{1+\varepsilon_i}{\sqrt{n}}\right)-\frac{1}{2} f\left(S_n+\frac{1-\varepsilon_i}{\sqrt{n}}\right)\right]^2 \\ &=\frac{1}{2}\left[\frac{1}{4} f\left(S_n+\frac{1-\varepsilon_i}{\sqrt{n}}\right)^2-2 \cdot \frac{1}{2} f\left(S_n+\frac{1-r_i}{\sqrt{n}}\right) \frac{1}{2} f\left(S_n-\frac{1+\varepsilon_i}{\sqrt{n}}\right)+\frac{1}{4} f\left(S_n-\frac{1+\varepsilon_i}{\sqrt{n}}\right)^2\right. \\ &\left.\quad+\frac{1}{4} f\left(S_n-\frac{1+\varepsilon_i}{\sqrt{n}}\right)^2-2 \cdot \frac{1}{2} f\left(S_n+\frac{1-r_i}{\sqrt{n}}\right) \frac{1}{2} f\left(S_n-\frac{1+\varepsilon_i}{\sqrt{n}}\right)+\frac{1}{n} f\left(S_n+\frac{1-\varepsilon_i}{\sqrt{n}}\right)^2\right] \\ &=\frac{1}{2}\left[\frac{1}{2} f\left(S_n+\frac{1-\varepsilon_i}{\sqrt{n}}\right)^2-2 \cdot \frac{1}{2} f\left(S_n+\frac{1-r_i}{\sqrt{n}}\right) f\left(S_n-\frac{1+\varepsilon_i}{\sqrt{n}}\right)+\frac{1}{2} f\left(S_n-\frac{1+\varepsilon_i}{\sqrt{n}}\right)^2\right] \\ &= \frac{1}{4}\left(f\left(S_n+\frac{1-\varepsilon_i}{\sqrt{n}}\right)-f\left(S_n-\frac{1+\varepsilon_i}{\sqrt{n}}\right)\right)^2. \end{aligned} \]

Since

\[\sum_{i=1}^n E\left[\left(Z-E^{(i)} Z\right)^2\right]=\sum_{i=1}^n \boldsymbol{E}\left[\operatorname{Var}^{(i)}(Z)\right],\]

we can apply Efron-Stein on \(Var^{(i)}\left(f\left(S_n\right)\right)\):

\[ \operatorname{Var}\left(f\left(S_n\right)\right) \leq \frac{1}{4} \sum_{i=1}^n \boldsymbol{E}\left[\left(f\left(S_n+\frac{1-\varepsilon_i}{\sqrt{n}}\right)-f\left(S_n-\frac{1+\varepsilon_i}{\sqrt{n}}\right)\right)^2\right] \]

The Rademacher sum converges in distribution to a standard Gaussian, hence \(Var(f(S_n))\) converges to \(Var(f(X))\).

Now, apply a Taylor approximation:

\[ \left|f\left(S_n+\frac{1-\varepsilon_i}{\sqrt{n}}\right)-f\left(S_n-\frac{1+\varepsilon_i}{\sqrt{n}}\right)\right| \leq \frac{2}{\sqrt{n}}\left|f^{\prime}\left(S_n\right)\right|+\frac{2 K}{n} \]

where \(K\) denotes the supremum of the absolute value of \(f''\). This yields

\[ \begin{aligned} & \frac{n}{4}\left(f\left(S_n+\frac{1-\varepsilon_i}{\sqrt{n}}\right)-f\left(S_n-\frac{1+\varepsilon_i}{\sqrt{n}}\right)\right)^2 \\ & \leq f^{\prime}\left(S_n\right)^2+\frac{2 K}{\sqrt{n}}\left|f^{\prime}\left(S_n\right)\right|+\frac{K^2}{n} . \end{aligned} \]

Finally, for the limit of the upper bound (when \(f(S_n)\) converges to \(f(X)\)), we obtain

\[ \limsup _{n \rightarrow \infty} \frac{1}{4} \sum_{i=1}^n E\left[\left(f\left(S_n+\frac{1-\varepsilon_i}{\sqrt{n}}\right)-f\left(S_n-\frac{1+\varepsilon_i}{\sqrt{n}}\right)\right)^2\right]=E\left[f^{\prime}(X)^2\right]. \]

For Lipschitz-functions with Lipschitz constant \(1\), this implies (via the Mean value theorem):

\[Var(f(X))\leq 1\]

References

Boucheron, Stéphane, Gábor Lugosi, and Pascal Massart. 2013. Concentration Inequalities: A Nonasymptotic Theory of Independence. Oxford university press.