# Basic concentration inequalities

Concentration inequalities

Collection of fairly simple, well-known inequalities that are typically taught at the beginning of a respective course or book.

## Simple tail bound for the Normal distribution

Proposition Let $$X\sim\mathcal{N}(0,1)$$. For all $$t>0$$, we have

$P(X\geq t) \leq\frac{1}{t}\frac{1}{\sqrt{2\pi}}\exp{(-t^2/2)}.$

For $t$, the tail bound is

$P(X\geq t)=\frac{1}{\sqrt{2\pi}}\exp{(-t^2/2)}.$

Proof We have

$P(X \geq t =\frac{1}{\sqrt{2\pi}})\int_{t}^\infty \exp{(-x^2/2)}$

Apply change-of-variables $$x=t+y$$:

\begin{aligned} P(X \geq t)&=\frac{1}{\sqrt{2\pi}}\int_0^\infty \exp{(-t^2/2)}\exp{(-ty)}\exp{(-y^2/2)}dy\\ &\leq \frac{1}{\sqrt{2\pi}}\exp{(-t^2/2)}\int_0^\infty \exp(-ty)dy\\ &=\frac{1}{t}\frac{1}{\sqrt{2\pi}}\exp{(-t^2/2)}\square \end{aligned}

## Markov’s inequality

Proposition For a non-negative random variable $$X$$ and $$t>0$$, it follows that

$$P(X\geq t)\leq \frac{\mathbb{E}[X]}{t}$$\$

Proof We have \begin{aligned} X \geq t &\Leftrightarrow X\mathbb{I}_{X\geq t} \geq t\mathbb{I}_{X\geq t}\\ & \Leftrightarrow \mathbb{E}[X\mathbb{I}_{X\geq t}] \geq \mathbb{E}[t\mathbb{I}_{X\geq t}] \\ & \Leftrightarrow \mathbb{E}[X\mathbb{I}_{X\geq t}] \geq t\mathbb{E}[\mathbb{I}_{X\geq t}] \\ & \Leftrightarrow \frac{\mathbb{E}[X\mathbb{I}_{X\geq t}]}{t} \geq P(X\geq t)\\ & \Leftrightarrow P(X\geq t) \leq \frac{\mathbb{E}[X\mathbb{I}_{X\geq t}]}{t} \leq \frac{\mathbb{E}[{X\geq t}]}{t}\square \end{aligned}

We can extend Markov’s inequality by choosing $$X$$ to be defined over an interval $$I\subset\mathbb{R}$$, $$\phi$$ a non-decreasing, non-negative function over $$I$$.

For any $$t\in I$$, we then have

$P(X\geq t)\leq P(\phi(X)\geq \phi(t))\leq \frac{\mathbb{E}[\phi(X)]}{\phi(t)}$

## Chebyshev’s inequality

Setting $$X=|Y-\mathbb{E}[Y]|$$ and $$\phi(t)=t^q$$ in Markov’s inequality, we get

$P(|Y-\mathbb{E}[Y]|\geq t)\leq\frac{\mathbb{E}[|Y-\mathbb{E}[Y]|^q]}{t^q}$

It is then possible to find an optimal $$q$$ for this inequality, such that $$\mathbb{E}[Y]^q <\infty$$ holds, too.

The original Chebyshev inequality actually uses $$q=2$$, which creates a bound with respect to $$Var(Y)$$.

## Cramér-Chernoff method

The key idea is to use Markov’s inequality with $$phi(t)=\exp{(\lambda t)}$$ and $$\lambda \geq 0$$:

$P(X \geq t)\leq \exp{(-\lambda t)}\mathbb{E}[\exp{(\lambda X)}]$

Notice that $$\mathbb{E}[\exp{(\lambda X)}]$$ is the moment-generating function of $$X$$.

Define

$\psi_X(\lambda)=\log{\mathbb{E}[\exp{(\lambda X)}]}$

and

$\psi^*_X(t)=\sup_{\lambda \geq 0}(\lambda t-\psi_X(\lambda)).$

This is known as the Cramér transform of $$X$$.

We have

\begin{aligned} \exp{(-\lambda t)}\mathbb{E}[\exp{(\lambda X)}] &=\exp{(\log\mathbb{E}[\exp{(\lambda X)}]}-\lambda t)\\ &=\exp{(\psi_X(\lambda)-\lambda t)}\\ &=\exp{(-(\lambda t-\psi_X(\lambda)))}. \end{aligned}

Since this holds for any $$t$$ and $$\lambda$$, it also holds that

$P(X \geq t)\leq \exp{(-\psi_X^*(t))}.$

Next, notice that

$\psi_X(\lambda)=\log\mathbb{E}[\exp{(\lambda X)}]\geq \log\exp\mathbb{E}[\lambda X]=\lambda\mathbb{E}[ X]$

by Jensen’s inequality. Then,

$\lambda t-\psi_X(\lambda)\leq 0$

whenever $$t\geq \mathbb{E}[X]$$. This allows us to extend the Cramér transform to the Fenchel-Legendre dual

$\psi^*_X(t)=\sup_{\lambda \in\mathbb{R}}(\lambda t-\psi_X(\lambda)),$

whenever $$t\geq\mathbb{E}[X]$$.

## References

Boucheron, Stéphane, Gábor Lugosi, and Pascal Massart. 2013. Concentration Inequalities: A Nonasymptotic Theory of Independence. Oxford university press.
Vershynin, Roman. 2018. High-Dimensional Probability: An Introduction with Applications in Data Science. Vol. 47. Cambridge university press.