Basic concentration inequalities

Concentration inequalities

Collection of fairly simple, well-known inequalities that are typically taught at the beginning of a respective course or book.

Simple tail bound for the Normal distribution

(Vershynin 2018)

Proposition Let \(X\sim\mathcal{N}(0,1)\). For all \(t>0\), we have

\[P(X\geq t) \leq\frac{1}{t}\frac{1}{\sqrt{2\pi}}\exp{(-t^2/2)}.\]

For $t $, the tail bound is

\[P(X\geq t)=\frac{1}{\sqrt{2\pi}}\exp{(-t^2/2)}.\]

Proof We have

\[P(X \geq t =\frac{1}{\sqrt{2\pi}})\int_{t}^\infty \exp{(-x^2/2)}\]

Apply change-of-variables \(x=t+y\):

\[\begin{aligned} P(X \geq t)&=\frac{1}{\sqrt{2\pi}}\int_0^\infty \exp{(-t^2/2)}\exp{(-ty)}\exp{(-y^2/2)}dy\\ &\leq \frac{1}{\sqrt{2\pi}}\exp{(-t^2/2)}\int_0^\infty \exp(-ty)dy\\ &=\frac{1}{t}\frac{1}{\sqrt{2\pi}}\exp{(-t^2/2)}\square \end{aligned}\]

Markov’s inequality

(Boucheron, Lugosi, and Massart 2013)

Proposition For a non-negative random variable \(X\) and \(t>0\), it follows that

\(P(X\geq t)\leq \frac{\mathbb{E}[X]}{t}\)$

Proof We have \[ \begin{aligned} X \geq t &\Leftrightarrow X\mathbb{I}_{X\geq t} \geq t\mathbb{I}_{X\geq t}\\ & \Leftrightarrow \mathbb{E}[X\mathbb{I}_{X\geq t}] \geq \mathbb{E}[t\mathbb{I}_{X\geq t}] \\ & \Leftrightarrow \mathbb{E}[X\mathbb{I}_{X\geq t}] \geq t\mathbb{E}[\mathbb{I}_{X\geq t}] \\ & \Leftrightarrow \frac{\mathbb{E}[X\mathbb{I}_{X\geq t}]}{t} \geq P(X\geq t)\\ & \Leftrightarrow P(X\geq t) \leq \frac{\mathbb{E}[X\mathbb{I}_{X\geq t}]}{t} \leq \frac{\mathbb{E}[{X\geq t}]}{t}\square \end{aligned} \]

We can extend Markov’s inequality by choosing \(X\) to be defined over an interval \(I\subset\mathbb{R}\), \(\phi\) a non-decreasing, non-negative function over \(I\).

For any \(t\in I\), we then have

\[P(X\geq t)\leq P(\phi(X)\geq \phi(t))\leq \frac{\mathbb{E}[\phi(X)]}{\phi(t)}\]

Chebyshev’s inequality

(Boucheron, Lugosi, and Massart 2013)

Setting \(X=|Y-\mathbb{E}[Y]|\) and \(\phi(t)=t^q\) in Markov’s inequality, we get

\[P(|Y-\mathbb{E}[Y]|\geq t)\leq\frac{\mathbb{E}[|Y-\mathbb{E}[Y]|^q]}{t^q}\]

It is then possible to find an optimal \(q\) for this inequality, such that \(\mathbb{E}[Y]^q <\infty\) holds, too.

The original Chebyshev inequality actually uses \(q=2\), which creates a bound with respect to \(Var(Y)\).

Cramér-Chernoff method

(Boucheron, Lugosi, and Massart 2013)

The key idea is to use Markov’s inequality with \(phi(t)=\exp{(\lambda t)}\) and \(\lambda \geq 0\):

\[P(X \geq t)\leq \exp{(-\lambda t)}\mathbb{E}[\exp{(\lambda X)}]\]

Notice that \(\mathbb{E}[\exp{(\lambda X)}]\) is the moment-generating function of \(X\).

Define

\[\psi_X(\lambda)=\log{\mathbb{E}[\exp{(\lambda X)}]}\]

and

\[\psi^*_X(t)=\sup_{\lambda \geq 0}(\lambda t-\psi_X(\lambda)).\]

This is known as the Cramér transform of \(X\).

We have

\[ \begin{aligned} \exp{(-\lambda t)}\mathbb{E}[\exp{(\lambda X)}] &=\exp{(\log\mathbb{E}[\exp{(\lambda X)}]}-\lambda t)\\ &=\exp{(\psi_X(\lambda)-\lambda t)}\\ &=\exp{(-(\lambda t-\psi_X(\lambda)))}. \end{aligned} \]

Since this holds for any \(t\) and \(\lambda\), it also holds that

\[P(X \geq t)\leq \exp{(-\psi_X^*(t))}.\]

Next, notice that

\[\psi_X(\lambda)=\log\mathbb{E}[\exp{(\lambda X)}]\geq \log\exp\mathbb{E}[\lambda X]=\lambda\mathbb{E}[ X]\]

by Jensen’s inequality. Then,

\[\lambda t-\psi_X(\lambda)\leq 0\]

whenever \(t\geq \mathbb{E}[X]\). This allows us to extend the Cramér transform to the Fenchel-Legendre dual

\[\psi^*_X(t)=\sup_{\lambda \in\mathbb{R}}(\lambda t-\psi_X(\lambda)),\]

whenever \(t\geq\mathbb{E}[X]\).

References

Boucheron, Stéphane, Gábor Lugosi, and Pascal Massart. 2013. Concentration Inequalities: A Nonasymptotic Theory of Independence. Oxford university press.
Vershynin, Roman. 2018. High-Dimensional Probability: An Introduction with Applications in Data Science. Vol. 47. Cambridge university press.