Basic concentration inequalities
Collection of fairly simple, well-known inequalities that are typically taught at the beginning of a respective course or book.
Simple tail bound for the Normal distribution
Proposition Let \(X\sim\mathcal{N}(0,1)\). For all \(t>0\), we have
\[P(X\geq t) \leq\frac{1}{t}\frac{1}{\sqrt{2\pi}}\exp{(-t^2/2)}.\]
For $t $, the tail bound is
\[P(X\geq t)=\frac{1}{\sqrt{2\pi}}\exp{(-t^2/2)}.\]
Proof We have
\[P(X \geq t =\frac{1}{\sqrt{2\pi}})\int_{t}^\infty \exp{(-x^2/2)}\]
Apply change-of-variables \(x=t+y\):
\[\begin{aligned} P(X \geq t)&=\frac{1}{\sqrt{2\pi}}\int_0^\infty \exp{(-t^2/2)}\exp{(-ty)}\exp{(-y^2/2)}dy\\ &\leq \frac{1}{\sqrt{2\pi}}\exp{(-t^2/2)}\int_0^\infty \exp(-ty)dy\\ &=\frac{1}{t}\frac{1}{\sqrt{2\pi}}\exp{(-t^2/2)}\square \end{aligned}\]
Markov’s inequality
(Boucheron, Lugosi, and Massart 2013)
Proposition For a non-negative random variable \(X\) and \(t>0\), it follows that
\(P(X\geq t)\leq \frac{\mathbb{E}[X]}{t}\)$
Proof We have \[ \begin{aligned} X \geq t &\Leftrightarrow X\mathbb{I}_{X\geq t} \geq t\mathbb{I}_{X\geq t}\\ & \Leftrightarrow \mathbb{E}[X\mathbb{I}_{X\geq t}] \geq \mathbb{E}[t\mathbb{I}_{X\geq t}] \\ & \Leftrightarrow \mathbb{E}[X\mathbb{I}_{X\geq t}] \geq t\mathbb{E}[\mathbb{I}_{X\geq t}] \\ & \Leftrightarrow \frac{\mathbb{E}[X\mathbb{I}_{X\geq t}]}{t} \geq P(X\geq t)\\ & \Leftrightarrow P(X\geq t) \leq \frac{\mathbb{E}[X\mathbb{I}_{X\geq t}]}{t} \leq \frac{\mathbb{E}[{X\geq t}]}{t}\square \end{aligned} \]
We can extend Markov’s inequality by choosing \(X\) to be defined over an interval \(I\subset\mathbb{R}\), \(\phi\) a non-decreasing, non-negative function over \(I\).
For any \(t\in I\), we then have
\[P(X\geq t)\leq P(\phi(X)\geq \phi(t))\leq \frac{\mathbb{E}[\phi(X)]}{\phi(t)}\]
Chebyshev’s inequality
(Boucheron, Lugosi, and Massart 2013)
Setting \(X=|Y-\mathbb{E}[Y]|\) and \(\phi(t)=t^q\) in Markov’s inequality, we get
\[P(|Y-\mathbb{E}[Y]|\geq t)\leq\frac{\mathbb{E}[|Y-\mathbb{E}[Y]|^q]}{t^q}\]
It is then possible to find an optimal \(q\) for this inequality, such that \(\mathbb{E}[Y]^q <\infty\) holds, too.
The original Chebyshev inequality actually uses \(q=2\), which creates a bound with respect to \(Var(Y)\).
Cramér-Chernoff method
(Boucheron, Lugosi, and Massart 2013)
The key idea is to use Markov’s inequality with \(phi(t)=\exp{(\lambda t)}\) and \(\lambda \geq 0\):
\[P(X \geq t)\leq \exp{(-\lambda t)}\mathbb{E}[\exp{(\lambda X)}]\]
Notice that \(\mathbb{E}[\exp{(\lambda X)}]\) is the moment-generating function of \(X\).
Define
\[\psi_X(\lambda)=\log{\mathbb{E}[\exp{(\lambda X)}]}\]
and
\[\psi^*_X(t)=\sup_{\lambda \geq 0}(\lambda t-\psi_X(\lambda)).\]
This is known as the Cramér transform of \(X\).
We have
\[ \begin{aligned} \exp{(-\lambda t)}\mathbb{E}[\exp{(\lambda X)}] &=\exp{(\log\mathbb{E}[\exp{(\lambda X)}]}-\lambda t)\\ &=\exp{(\psi_X(\lambda)-\lambda t)}\\ &=\exp{(-(\lambda t-\psi_X(\lambda)))}. \end{aligned} \]
Since this holds for any \(t\) and \(\lambda\), it also holds that
\[P(X \geq t)\leq \exp{(-\psi_X^*(t))}.\]
Next, notice that
\[\psi_X(\lambda)=\log\mathbb{E}[\exp{(\lambda X)}]\geq \log\exp\mathbb{E}[\lambda X]=\lambda\mathbb{E}[ X]\]
by Jensen’s inequality. Then,
\[\lambda t-\psi_X(\lambda)\leq 0\]
whenever \(t\geq \mathbb{E}[X]\). This allows us to extend the Cramér transform to the Fenchel-Legendre dual
\[\psi^*_X(t)=\sup_{\lambda \in\mathbb{R}}(\lambda t-\psi_X(\lambda)),\]
whenever \(t\geq\mathbb{E}[X]\).