\institution

Institut fΓΌr Informatik
Freie UniversitΓ€t Berlin
E-Mail: mulzer@inf.fu-berlin.de

Five Proofs of Chernoff’s Bound with Applications111Supported in part by DFG Grants MU 3501/1 and MU 3501/2 and ERC StG 757609.

Wolfgang Mulzer
Abstract

We discuss five ways of proving Chernoff’s bound and show how they lead to different extensions of the basic bound.

1 Introduction

Chernoff’s bound gives an estimate on the probability that a sum of independent Binomial random variables deviates from its expectation [14]. It has many variants and extensions that are known under various names such as Bernstein’s inequality or Hoeffding’s bound [4, 14]. Chernoff’s bound is one of the most basic and versatile tools in the life of a theoretical computer scientist, with a seemingly endless amount of applications. Almost every contemporary textbook on algorithms or complexity theory contains a statement and a proof of the bound [2, 12, 16, 8], and there are several texts that discuss its various applications in great detail (e.g., the textbooks by Alon and Spencer [1], Dubhashi and Panchonesi [10], Mitzenmacher and Upfal [19], Motwani and Raghavan [21], or the articles by Chung and Lu [6], Hagerup and RΓΌb [13], or McDiarmid [17]).

In the present survey, we will see five different ways of proving the basic Chernoff bound. The different techniques used in these proofs allow various generalizations and extensions, some of which we will also discuss.

2 The Basic Bound

We begin with a statement of the basic Chernoff bound. For this, we first need a notion from information theory [9]. Let P=(p1,…,pm)𝑃subscript𝑝1…subscriptπ‘π‘šP=(p_{1},\dots,p_{m}) and Q=(q1,…,qm)𝑄subscriptπ‘ž1…subscriptπ‘žπ‘šQ=(q_{1},\dots,q_{m}) be two probability distributions on mπ‘šm elements, i.e., pi,qiβˆˆβ„subscript𝑝𝑖subscriptπ‘žπ‘–β„p_{i},q_{i}\in\mathbb{R} with pi,qiβ‰₯0subscript𝑝𝑖subscriptπ‘žπ‘–0p_{i},q_{i}\geq 0, for i=1,…,m𝑖1β€¦π‘ši=1,\dots,m, and βˆ‘i=1mpi=βˆ‘i=1mqi=1superscriptsubscript𝑖1π‘šsubscript𝑝𝑖superscriptsubscript𝑖1π‘šsubscriptπ‘žπ‘–1\sum_{i=1}^{m}p_{i}=\sum_{i=1}^{m}q_{i}=1. The Kullback-Leibler divergence or relative entropy of P𝑃P and Q𝑄Q is defined as

DKL​(Pβˆ₯Q):=βˆ‘i=1mpi​ln⁑piqi.assignsubscript𝐷KLconditional𝑃𝑄superscriptsubscript𝑖1π‘šsubscript𝑝𝑖subscript𝑝𝑖subscriptπ‘žπ‘–D_{\textup{KL}}(P\|Q):=\sum_{i=1}^{m}p_{i}\ln\frac{p_{i}}{q_{i}}.

If m=2π‘š2m=2, i.e., if P=(p,1βˆ’p)𝑃𝑝1𝑝P=(p,1-p) and Q=(q,1βˆ’q)π‘„π‘ž1π‘žQ=(q,1-q), we write DKL​(pβˆ₯q)subscript𝐷KLconditionalπ‘π‘žD_{\textup{KL}}(p\|q) for DKL​((p,1βˆ’p)βˆ₯(q,1βˆ’q))subscript𝐷KLconditional𝑝1π‘π‘ž1π‘žD_{\textup{KL}}((p,1-p)\|(q,1-q)). The Kullback-Leibler divergence measures the distance between the distributions P𝑃P and Q𝑄Q: it represents the expected loss of efficiency if we encode an mπ‘šm-letter alphabet with distribution P𝑃P with a code that is optimal for distribution Q𝑄Q. Now, the basic Chernoff bound is as follows:

Theorem 2.1.

Let nβˆˆβ„•π‘›β„•n\in\mathbb{N}, p∈[0,1]𝑝01p\in[0,1], and let X1,…,Xnsubscript𝑋1…subscript𝑋𝑛X_{1},\dots,X_{n} be n𝑛n independent random variables with Xi∈{0,1}subscript𝑋𝑖01X_{i}\in\{0,1\} and Pr⁑[Xi=1]=pPrsubscript𝑋𝑖1𝑝\Pr[X_{i}=1]=p, for i=1,…​n𝑖1…𝑛i=1,\dots n. Set X:=βˆ‘i=1nXiassign𝑋superscriptsubscript𝑖1𝑛subscript𝑋𝑖X:=\sum_{i=1}^{n}X_{i}. Then, for any t∈[0,1βˆ’p]𝑑01𝑝t\in[0,1-p], we have

Pr⁑[Xβ‰₯(p+t)​n]≀eβˆ’DKL​(p+tβˆ₯p)​n.Pr𝑋𝑝𝑑𝑛superscript𝑒subscript𝐷KL𝑝conditional𝑑𝑝𝑛\Pr[X\geq(p+t)n]\leq e^{-D_{\textup{KL}}(p+t\|p)n}.

3 Five Proofs for Theorem 2.1

We will now see five different ways of proving Theorem 2.1.

3.1 The Moment Method

The usual textbook proof of Theorem 2.1 uses the exponential function exp\exp and Markov’s inequality. It is called the moment method, because exp\exp simultaneously encodes all moments X,X2,X3,…𝑋superscript𝑋2superscript𝑋3…X,X^{2},X^{3},\dots of X𝑋X. This trick is often attributed to Bernstein [4]. It is very general and can be used to obtain several variants of Theorem 2.1, perhaps most prominently, the Azuma-Hoeffding inequality for martingales with bounded differences [14, 3].

The proof goes as follows. Let Ξ»>0πœ†0\lambda>0 be a parameter to be determined later. We have

Pr⁑[Xβ‰₯(p+t)​n]=Pr⁑[λ​Xβ‰₯λ​(p+t)​n]=Pr⁑[eλ​Xβ‰₯eλ​(p+t)​n].Pr𝑋𝑝𝑑𝑛Prπœ†π‘‹πœ†π‘π‘‘π‘›Prsuperscriptπ‘’πœ†π‘‹superscriptπ‘’πœ†π‘π‘‘π‘›\Pr[X\geq(p+t)n]=\Pr[\lambda X\geq\lambda(p+t)n]=\Pr\bigl{[}e^{\lambda X}\geq e^{\lambda(p+t)n}\bigr{]}.

From Markov’s inequality, we obtain

Pr⁑[eλ​Xβ‰₯eλ​(p+t)​n]≀𝐄​[eλ​X]eλ​(p+t)​n.Prsuperscriptπ‘’πœ†π‘‹superscriptπ‘’πœ†π‘π‘‘π‘›π„delimited-[]superscriptπ‘’πœ†π‘‹superscriptπ‘’πœ†π‘π‘‘π‘›\Pr\bigl{[}e^{\lambda X}\geq e^{\lambda(p+t)n}\bigr{]}\leq\frac{\mathbf{E}[e^{\lambda X}]}{e^{\lambda(p+t)n}}.

Now, the independence of the Xisubscript𝑋𝑖X_{i} yields

𝐄​[eλ​X]=𝐄​[eΞ»β€‹βˆ‘i=1nXi]=𝐄​[∏i=1neλ​Xi]=∏i=1n𝐄​[eλ​Xi]=(p​eΞ»+1βˆ’p)n.𝐄delimited-[]superscriptπ‘’πœ†π‘‹π„delimited-[]superscriptπ‘’πœ†superscriptsubscript𝑖1𝑛subscript𝑋𝑖𝐄delimited-[]superscriptsubscriptproduct𝑖1𝑛superscriptπ‘’πœ†subscript𝑋𝑖superscriptsubscriptproduct𝑖1𝑛𝐄delimited-[]superscriptπ‘’πœ†subscript𝑋𝑖superscript𝑝superscriptπ‘’πœ†1𝑝𝑛\mathbf{E}[e^{\lambda X}]=\mathbf{E}\Bigl{[}e^{\lambda\sum_{i=1}^{n}X_{i}}\Bigr{]}=\mathbf{E}\Biggl{[}\prod_{i=1}^{n}e^{\lambda X_{i}}\Biggr{]}=\prod_{i=1}^{n}\mathbf{E}\Bigl{[}e^{\lambda X_{i}}\Bigr{]}=\bigl{(}pe^{\lambda}+1-p\bigr{)}^{n}.

Thus,

Pr⁑[X>(p+t)​n]≀(p​eΞ»+1βˆ’peλ​(p+t))n,Pr𝑋𝑝𝑑𝑛superscript𝑝superscriptπ‘’πœ†1𝑝superscriptπ‘’πœ†π‘π‘‘π‘›\Pr[X>(p+t)n]\leq\Bigl{(}\frac{pe^{\lambda}+1-p}{e^{\lambda(p+t)}}\Bigr{)}^{n},(1)

for every Ξ»>0πœ†0\lambda>0. Optimizing for Ξ»πœ†\lambda using calculus, we get that the right hand side is minimized if

eΞ»=(1βˆ’p)​(p+t)p​(1βˆ’pβˆ’t).superscriptπ‘’πœ†1𝑝𝑝𝑑𝑝1𝑝𝑑e^{\lambda}=\frac{(1-p)(p+t)}{p(1-p-t)}.

Plugging this into (1), we get

Pr⁑[X>(p+t)​n]≀[(pp+t)p+t​(1βˆ’p1βˆ’pβˆ’t)1βˆ’pβˆ’t]n=eβˆ’DKL​(p+tβˆ₯p)​n,Pr𝑋𝑝𝑑𝑛superscriptdelimited-[]superscript𝑝𝑝𝑑𝑝𝑑superscript1𝑝1𝑝𝑑1𝑝𝑑𝑛superscript𝑒subscript𝐷KL𝑝conditional𝑑𝑝𝑛\Pr[X>(p+t)n]\leq\Biggl{[}\Bigl{(}\frac{p}{p+t}\Bigr{)}^{p+t}\Bigl{(}\frac{1-p}{1-p-t}\Bigr{)}^{1-p-t}\Biggr{]}^{n}=e^{-D_{\textup{KL}}(p+t\|p)n},

as desired.

3.2 ChvΓ‘tal’s Method

The following proof of Theorem 2.1 is due to ChvΓ‘tal [7]. As we will see below, it can be generalized to give tail bounds for the hypergeometric distribution. Let B​(n,p)𝐡𝑛𝑝B(n,p) be the random variable that gives the number of heads in n𝑛n independent Bernoulli trials with success probability p𝑝p. Then,

Pr⁑[B​(n,p)=l]=(nl)​pl​(1βˆ’p)nβˆ’l,Pr𝐡𝑛𝑝𝑙binomial𝑛𝑙superscript𝑝𝑙superscript1𝑝𝑛𝑙\Pr[B(n,p)=l]=\binom{n}{l}p^{l}(1-p)^{n-l},

for l=0,…,n𝑙0…𝑛l=0,\dots,n. Thus, for any Ο„β‰₯1𝜏1\tau\geq 1 and kβ‰₯p​nπ‘˜π‘π‘›k\geq pn, we get

Pr⁑[B​(n,p)β‰₯k]=βˆ‘i=kn(ni)​pi​(1βˆ’p)nβˆ’iβ‰€βˆ‘i=kn(ni)​pi​(1βˆ’p)nβˆ’i​τiβˆ’k⏟β‰₯1+βˆ‘i=0kβˆ’1(ni)​pi​(1βˆ’p)nβˆ’i​τiβˆ’k⏟β‰₯0=βˆ‘i=0n(ni)​pi​(1βˆ’p)nβˆ’i​τiβˆ’k.Prπ΅π‘›π‘π‘˜superscriptsubscriptπ‘–π‘˜π‘›binomial𝑛𝑖superscript𝑝𝑖superscript1𝑝𝑛𝑖superscriptsubscriptπ‘–π‘˜π‘›binomial𝑛𝑖superscript𝑝𝑖superscript1𝑝𝑛𝑖subscript⏟superscriptπœπ‘–π‘˜absent1subscript⏟superscriptsubscript𝑖0π‘˜1binomial𝑛𝑖superscript𝑝𝑖superscript1𝑝𝑛𝑖superscriptπœπ‘–π‘˜absent0superscriptsubscript𝑖0𝑛binomial𝑛𝑖superscript𝑝𝑖superscript1𝑝𝑛𝑖superscriptπœπ‘–π‘˜\Pr[B(n,p)\geq k]=\sum_{i=k}^{n}\binom{n}{i}p^{i}(1-p)^{n-i}\\ \leq\sum_{i=k}^{n}\binom{n}{i}p^{i}(1-p)^{n-i}\underbrace{\tau^{i-k}}_{\geq 1}+\underbrace{\sum_{i=0}^{k-1}\binom{n}{i}p^{i}(1-p)^{n-i}\tau^{i-k}}_{\geq 0}=\sum_{i=0}^{n}\binom{n}{i}p^{i}(1-p)^{n-i}\tau^{i-k}.

Using the Binomial theorem, we obtain

Pr⁑[B​(n,p)β‰₯k]β‰€βˆ‘i=0n(ni)​pi​(1βˆ’p)nβˆ’i​τiβˆ’k=Ο„βˆ’kβ€‹βˆ‘i=0n(ni)​(p​τ)i​(1βˆ’p)nβˆ’i=(p​τ+1βˆ’p)nΟ„k.Prπ΅π‘›π‘π‘˜superscriptsubscript𝑖0𝑛binomial𝑛𝑖superscript𝑝𝑖superscript1𝑝𝑛𝑖superscriptπœπ‘–π‘˜superscriptπœπ‘˜superscriptsubscript𝑖0𝑛binomial𝑛𝑖superscriptπ‘πœπ‘–superscript1𝑝𝑛𝑖superscriptπ‘πœ1𝑝𝑛superscriptπœπ‘˜\Pr[B(n,p)\geq k]\leq\sum_{i=0}^{n}\binom{n}{i}p^{i}(1-p)^{n-i}\tau^{i-k}=\tau^{-k}\sum_{i=0}^{n}\binom{n}{i}(p\tau)^{i}(1-p)^{n-i}=\frac{(p\tau+1-p)^{n}}{\tau^{k}}.

If we write k=(p+t)​nπ‘˜π‘π‘‘π‘›k=(p+t)n and Ο„=eλ𝜏superscriptπ‘’πœ†\tau=e^{\lambda}, we get

Pr⁑[B​(n,p)β‰₯(p+t)​n]≀(p​eΞ»+1βˆ’peλ​(p+t))n.Pr𝐡𝑛𝑝𝑝𝑑𝑛superscript𝑝superscriptπ‘’πœ†1𝑝superscriptπ‘’πœ†π‘π‘‘π‘›\Pr[B(n,p)\geq(p+t)n]\leq\Bigl{(}\frac{pe^{\lambda}+1-p}{e^{\lambda(p+t)}}\Bigr{)}^{n}.

This is the same as (1), so we can complete the proof of Theorem 2.1 as in Section 3.1.

3.3 The Impagliazzo-Kabanets Method

The third proof is due to Impagliazzo and Kabanets [15], and it leads to a constructive version of the bound. Let λ∈[0,1]πœ†01\lambda\in[0,1] be a parameter to be chosen later. Let IβŠ†{1,…,n}𝐼1…𝑛I\subseteq\{1,\dots,n\} be a random index set obtained by including each element i∈{1,…,n}𝑖1…𝑛i\in\{1,\dots,n\} with probability Ξ»πœ†\lambda. We estimate 𝐄​[∏i∈IXi]𝐄delimited-[]subscriptproduct𝑖𝐼subscript𝑋𝑖\mathbf{E}\bigl{[}\prod_{i\in I}X_{i}\bigr{]} in two different ways, where the expectation is over the random choice of X1,…,Xnsubscript𝑋1…subscript𝑋𝑛X_{1},\dots,X_{n} and I𝐼I.

On the one hand, using the law of total expectation and independence, we have

𝐄​[∏i∈IXi]=βˆ‘SβŠ†{1,…,n}Pr⁑[I=S]⋅𝐄​[∏i∈SXi]=βˆ‘SβŠ†{1,…,n}Pr⁑[I=S]β‹…βˆi∈SPr⁑[Xi=1]=βˆ‘SβŠ†{1,…,n}Ξ»|S|​(1βˆ’Ξ»)nβˆ’|S|β‹…p|S|=(λ​p+1βˆ’Ξ»)n.𝐄delimited-[]subscriptproduct𝑖𝐼subscript𝑋𝑖subscript𝑆1…𝑛⋅Pr𝐼𝑆𝐄delimited-[]subscriptproduct𝑖𝑆subscript𝑋𝑖subscript𝑆1…𝑛⋅Pr𝐼𝑆subscriptproduct𝑖𝑆Prsubscript𝑋𝑖1subscript𝑆1…𝑛⋅superscriptπœ†π‘†superscript1πœ†π‘›π‘†superscript𝑝𝑆superscriptπœ†π‘1πœ†π‘›\mathbf{E}\Bigl{[}\prod_{i\in I}X_{i}\Bigr{]}=\sum_{S\subseteq\{1,\dots,n\}}\Pr[I=S]\cdot\mathbf{E}\Bigl{[}\prod_{i\in S}X_{i}\Bigr{]}=\sum_{S\subseteq\{1,\dots,n\}}\Pr[I=S]\cdot\prod_{i\in S}\Pr[X_{i}=1]\\ =\sum_{S\subseteq\{1,\dots,n\}}\lambda^{|S|}(1-\lambda)^{n-|S|}\cdot p^{|S|}=(\lambda p+1-\lambda)^{n}.(2)

On the other hand, by the law of total expectation,

𝐄​[∏i∈IXi]β‰₯𝐄​[∏i∈IXi∣Xβ‰₯(p+t)​n]​Pr⁑[Xβ‰₯(p+t)​n].𝐄delimited-[]subscriptproduct𝑖𝐼subscript𝑋𝑖𝐄delimited-[]conditionalsubscriptproduct𝑖𝐼subscript𝑋𝑖𝑋𝑝𝑑𝑛Pr𝑋𝑝𝑑𝑛\mathbf{E}\Bigl{[}\prod_{i\in I}X_{i}\Bigr{]}\geq\mathbf{E}\Bigl{[}\prod_{i\in I}X_{i}\mid X\geq(p+t)n\Bigr{]}\Pr[X\geq(p+t)n].

Now, fix X1,…,Xnsubscript𝑋1…subscript𝑋𝑛X_{1},\dots,X_{n} with Xβ‰₯(p+t)​n𝑋𝑝𝑑𝑛X\geq(p+t)n. For the fixed choice of X1=x1,…,Xn=xnformulae-sequencesubscript𝑋1subscriptπ‘₯1…subscript𝑋𝑛subscriptπ‘₯𝑛X_{1}=x_{1},\dots,X_{n}=x_{n}, the expectation 𝐄​[∏i∈Ixi]𝐄delimited-[]subscriptproduct𝑖𝐼subscriptπ‘₯𝑖\mathbf{E}\bigl{[}\prod_{i\in I}x_{i}\bigr{]} is exactly the probability that I𝐼I avoids all the nβˆ’X𝑛𝑋n-X indices i𝑖i where xi=0subscriptπ‘₯𝑖0x_{i}=0. Thus, the conditional expectation is

𝐄​[∏i∈IXi∣Xβ‰₯(p+t)​n]=𝐄​[(1βˆ’Ξ»)nβˆ’X∣Xβ‰₯(p+t)​n]β‰₯(1βˆ’Ξ»)(1βˆ’pβˆ’t)​n,𝐄delimited-[]conditionalsubscriptproduct𝑖𝐼subscript𝑋𝑖𝑋𝑝𝑑𝑛𝐄delimited-[]conditionalsuperscript1πœ†π‘›π‘‹π‘‹π‘π‘‘π‘›superscript1πœ†1𝑝𝑑𝑛\mathbf{E}\Bigl{[}\prod_{i\in I}X_{i}\mid X\geq(p+t)n\Bigr{]}=\mathbf{E}\Bigl{[}(1-\lambda)^{n-X}\mid X\geq(p+t)n\Bigr{]}\geq(1-\lambda)^{(1-p-t)n},

so

𝐄​[∏i∈IXi]β‰₯(1βˆ’Ξ»)(1βˆ’pβˆ’t)​n​Pr⁑[Xβ‰₯(p+t)​n].𝐄delimited-[]subscriptproduct𝑖𝐼subscript𝑋𝑖superscript1πœ†1𝑝𝑑𝑛Pr𝑋𝑝𝑑𝑛\mathbf{E}\Bigl{[}\prod_{i\in I}X_{i}\Bigr{]}\geq(1-\lambda)^{(1-p-t)n}\Pr[X\geq(p+t)n].

Combining with (2),

Pr⁑[Xβ‰₯(p+t)​n]≀(λ​p+1βˆ’Ξ»(1βˆ’Ξ»)(1βˆ’pβˆ’t))n.Pr𝑋𝑝𝑑𝑛superscriptπœ†π‘1πœ†superscript1πœ†1𝑝𝑑𝑛\Pr[X\geq(p+t)n]\leq\left(\frac{\lambda p+1-\lambda}{(1-\lambda)^{(1-p-t)}}\right)^{n}.(3)

Using calculus, we get that the right hand side is minimized for Ξ»=t/(1βˆ’p)​(p+t)πœ†π‘‘1𝑝𝑝𝑑\lambda=t/(1-p)(p+t) (note that λ≀1πœ†1\lambda\leq 1 for t≀1βˆ’p𝑑1𝑝t\leq 1-p). Plugging this into (3),

Pr⁑[X>(p+t)​n]≀[(pp+t)p+t​(1βˆ’p1βˆ’pβˆ’t)1βˆ’pβˆ’t]n=eβˆ’DKL​(p+tβˆ₯p)​n,Pr𝑋𝑝𝑑𝑛superscriptdelimited-[]superscript𝑝𝑝𝑑𝑝𝑑superscript1𝑝1𝑝𝑑1𝑝𝑑𝑛superscript𝑒subscript𝐷KL𝑝conditional𝑑𝑝𝑛\Pr[X>(p+t)n]\leq\Biggl{[}\Bigl{(}\frac{p}{p+t}\Bigr{)}^{p+t}\Bigl{(}\frac{1-p}{1-p-t}\Bigr{)}^{1-p-t}\Biggr{]}^{n}=e^{-D_{\textup{KL}}(p+t\|p)n},

as desired.

3.4 The Encoding Argument

The next proof stems from discussions with Luc Devroye, GΓ‘bor Lugosi, and Pat Morin, and it is inspired by an encoding argument [20]. A similar argument can also be derived from Xinjia Chen’s likelihood ratio method [5]. Let {0,1}nsuperscript01𝑛\{0,1\}^{n} be the set of all bit strings of length n𝑛n, and let w:{0,1}nβ†’[0,1]:𝑀→superscript01𝑛01w:\{0,1\}^{n}\rightarrow[0,1] be a weight function. We call w𝑀w valid if βˆ‘x∈{0,1}nw​(x)≀1subscriptπ‘₯superscript01𝑛𝑀π‘₯1\sum_{x\in\{0,1\}^{n}}w(x)\leq 1. The following lemma says that for any probability distribution pxsubscript𝑝π‘₯p_{x} on {0,1}nsuperscript01𝑛\{0,1\}^{n}, a valid weight function is unlikely to be substantially larger than pxsubscript𝑝π‘₯p_{x}.

Lemma 3.1.

Let π’Ÿπ’Ÿ\mathcal{D} be a probability distribution on {0,1}nsuperscript01𝑛\{0,1\}^{n} that assigns to each x∈{0,1}nπ‘₯superscript01𝑛x\in\{0,1\}^{n} a probability pxsubscript𝑝π‘₯p_{x}, and let w𝑀w be a valid weight function. For any sβ‰₯1𝑠1s\geq 1, we have

PrxβˆΌπ’Ÿβ‘[w​(x)β‰₯s​px]≀1/s.subscriptPrsimilar-toπ‘₯π’Ÿπ‘€π‘₯𝑠subscript𝑝π‘₯1𝑠\Pr_{x\sim\mathcal{D}}\left[w(x)\geq sp_{x}\right]\leq 1/s.
Proof.

Let Zs={x∈{0,1}n∣w​(x)β‰₯s​px}subscript𝑍𝑠conditional-setπ‘₯superscript01𝑛𝑀π‘₯𝑠subscript𝑝π‘₯Z_{s}=\{x\in\{0,1\}^{n}\mid w(x)\geq sp_{x}\}. We have

PrxβˆΌπ’Ÿβ‘[w​(x)β‰₯s​px]=βˆ‘x∈Zspx>0pxβ‰€βˆ‘x∈Zspx>0px​w​(x)s​px≀(1/s)β€‹βˆ‘x∈Zsw​(x)≀1/s,subscriptPrsimilar-toπ‘₯π’Ÿπ‘€π‘₯𝑠subscript𝑝π‘₯subscriptπ‘₯subscript𝑍𝑠subscript𝑝π‘₯0subscript𝑝π‘₯subscriptπ‘₯subscript𝑍𝑠subscript𝑝π‘₯0subscript𝑝π‘₯𝑀π‘₯𝑠subscript𝑝π‘₯1𝑠subscriptπ‘₯subscript𝑍𝑠𝑀π‘₯1𝑠\Pr_{x\sim\mathcal{D}}\left[w(x)\geq sp_{x}\right]=\sum_{\begin{subarray}{c}x\in Z_{s}\\ p_{x}>0\end{subarray}}p_{x}\leq\sum_{\begin{subarray}{c}x\in Z_{s}\\ p_{x}>0\end{subarray}}p_{x}\frac{w(x)}{sp_{x}}\leq(1/s)\sum_{x\in Z_{s}}w(x)\leq 1/s,

since w​(x)/s​pxβ‰₯1𝑀π‘₯𝑠subscript𝑝π‘₯1w(x)/sp_{x}\geq 1 for x∈Zsπ‘₯subscript𝑍𝑠x\in Z_{s}, px>0subscript𝑝π‘₯0p_{x}>0, and since w𝑀w is valid. ∎

We now show that Lemma 3.1 implies Theorem 2.1. For this, we interpret the sequence X1,…,Xnsubscript𝑋1…subscript𝑋𝑛X_{1},\dots,X_{n} as a bit string of length n𝑛n. This induces a probability distribution π’Ÿπ’Ÿ\mathcal{D} that assigns to each x∈{0,1}nπ‘₯superscript01𝑛x\in\{0,1\}^{n} the probability px=pkx​(1βˆ’p)nβˆ’kxsubscript𝑝π‘₯superscript𝑝subscriptπ‘˜π‘₯superscript1𝑝𝑛subscriptπ‘˜π‘₯p_{x}=p^{k_{x}}(1-p)^{n-k_{x}}, where kxsubscriptπ‘˜π‘₯k_{x} denotes the number of 111-bits in xπ‘₯x. We define a weight function w:{0,1}nβ†’[0,1]:𝑀→superscript01𝑛01w:\{0,1\}^{n}\rightarrow[0,1] by w​(x)=(p+t)kx​(1βˆ’pβˆ’t)nβˆ’kx𝑀π‘₯superscript𝑝𝑑subscriptπ‘˜π‘₯superscript1𝑝𝑑𝑛subscriptπ‘˜π‘₯w(x)=(p+t)^{k_{x}}(1-p-t)^{n-k_{x}}, for x∈{0,1}nπ‘₯superscript01𝑛x\in\{0,1\}^{n}. Then w𝑀w is valid, since w​(x)𝑀π‘₯w(x) is the probability that xπ‘₯x is generated by setting each bit to 111 independently with probability p+t𝑝𝑑p+t. For x∈{0,1}nπ‘₯superscript01𝑛x\in\{0,1\}^{n}, we have

w​(x)px=(p+tp)kx​(1βˆ’pβˆ’t1βˆ’p)nβˆ’kx.𝑀π‘₯subscript𝑝π‘₯superscript𝑝𝑑𝑝subscriptπ‘˜π‘₯superscript1𝑝𝑑1𝑝𝑛subscriptπ‘˜π‘₯\frac{w(x)}{p_{x}}=\left(\frac{p+t}{p}\right)^{k_{x}}\left(\frac{1-p-t}{1-p}\right)^{n-k_{x}}.

Since ((p+t)/p)​((1βˆ’p)/(1βˆ’pβˆ’t))β‰₯1𝑝𝑑𝑝1𝑝1𝑝𝑑1((p+t)/p)((1-p)/(1-p-t))\geq 1, it follows that w​(x)/px𝑀π‘₯subscript𝑝π‘₯w(x)/p_{x} is an increasing function of kxsubscriptπ‘˜π‘₯k_{x}. Hence, if kxβ‰₯(p+t)​nsubscriptπ‘˜π‘₯𝑝𝑑𝑛k_{x}\geq(p+t)n, we have

w​(x)pxβ‰₯[(p+tp)p+t​(1βˆ’pβˆ’t1βˆ’p)1βˆ’pβˆ’t]n=eDKL​(p+tβˆ₯p)​n.𝑀π‘₯subscript𝑝π‘₯superscriptdelimited-[]superscript𝑝𝑑𝑝𝑝𝑑superscript1𝑝𝑑1𝑝1𝑝𝑑𝑛superscript𝑒subscript𝐷KL𝑝conditional𝑑𝑝𝑛\frac{w(x)}{p_{x}}\geq\left[\left(\frac{p+t}{p}\right)^{p+t}\left(\frac{1-p-t}{1-p}\right)^{1-p-t}\right]^{n}=e^{D_{\textup{KL}}(p+t\|p)n}.

We now apply Lemma 3.1 to π’Ÿπ’Ÿ\mathcal{D} and w𝑀w to get

Pr⁑[Xβ‰₯(p+t)​n]=PrxβˆΌπ’Ÿβ‘[kxβ‰₯(p+t)​n]≀PrxβˆΌπ’Ÿβ‘[w​(x)β‰₯px​eDKL​(p+tβˆ₯p)​n]≀eβˆ’DKL​(p+tβˆ₯p)​n,Pr𝑋𝑝𝑑𝑛subscriptPrsimilar-toπ‘₯π’Ÿsubscriptπ‘˜π‘₯𝑝𝑑𝑛subscriptPrsimilar-toπ‘₯π’Ÿπ‘€π‘₯subscript𝑝π‘₯superscript𝑒subscript𝐷KL𝑝conditional𝑑𝑝𝑛superscript𝑒subscript𝐷KL𝑝conditional𝑑𝑝𝑛\Pr[X\geq(p+t)n]=\Pr_{x\sim\mathcal{D}}[k_{x}\geq(p+t)n]\leq\Pr_{x\sim\mathcal{D}}\left[w(x)\geq p_{x}e^{D_{\textup{KL}}(p+t\|p)n}\right]\leq e^{-D_{\textup{KL}}(p+t\|p)n},

as claimed in Theorem 2.1.

See the survey [20] for a more thorough discussion of how this proof is related to coding theory.

3.5 A Proof via Differential Privacy

The fifth proof of Chernoff’s bound is due to Steinke and Ullman [22], and it uses methods from the theory of differential privacy [11]. Unlike the previous four proofs, it seems to lead to a slightly weaker version of the bound. Let mπ‘šm be a parameter to be determined later. The main idea is to bound the expectation of mβˆ’1π‘š1m-1 independent copies of X𝑋X.

Lemma 3.2.

Let mβˆˆβ„•π‘šβ„•m\in\mathbb{N} and m≀enπ‘šsuperscript𝑒𝑛m\leq e^{n}. Let X(1),…,X(mβˆ’1)superscript𝑋1…superscriptπ‘‹π‘š1X^{(1)},\dots,X^{(m-1)} be mβˆ’1π‘š1m-1 independent copies of X𝑋X, and set X(m)=𝐄​[X]superscriptπ‘‹π‘šπ„delimited-[]𝑋X^{(m)}=\mathbf{E}[X]. Then,

𝐄​[max⁑{X(1),…,X(m)}]≀p​n+5​n​ln⁑m.𝐄delimited-[]superscript𝑋1…superscriptπ‘‹π‘šπ‘π‘›5π‘›π‘š\mathbf{E}\big{[}\max\{X^{(1)},\dots,X^{(m)}\}\big{]}\leq pn+5\sqrt{n\ln m}.

We will give a proof of Lemma 3.2 below. First, however, we will see how we can use Lemma 3.2 to derive the following weaker version of Theorem 2.1.222In the published version of this paper, the proof of Theorem 3.3 is based on an incorrect application of Markov’s inequality. We have changed Lemma 3.2 so that X(m)superscriptπ‘‹π‘šX^{(m)} is fixed to 𝐄​[X]𝐄delimited-[]𝑋\mathbf{E}[X]. This ensures that Markov’s inequality is applied to a nonnegative random variable. We thank Natalia Shenkman for pointing this out to us.

Theorem 3.3.

Let nβˆˆβ„•π‘›β„•n\in\mathbb{N}, p∈[0,1]𝑝01p\in[0,1], and let X1,…,Xnsubscript𝑋1…subscript𝑋𝑛X_{1},\dots,X_{n} be n𝑛n independent random variables with Xi∈{0,1}subscript𝑋𝑖01X_{i}\in\{0,1\} and Pr⁑[Xi=1]=pPrsubscript𝑋𝑖1𝑝\Pr[X_{i}=1]=p, for i=1,…​n𝑖1…𝑛i=1,\dots n. Set X:=βˆ‘i=1nXiassign𝑋superscriptsubscript𝑖1𝑛subscript𝑋𝑖X:=\sum_{i=1}^{n}X_{i}. Then, for any t∈[0,1βˆ’p]𝑑01𝑝t\in[0,1-p], we have

Pr⁑[Xβ‰₯(p+t)​n]≀e1βˆ’164​t2​n.Pr𝑋𝑝𝑑𝑛superscript𝑒1164superscript𝑑2𝑛\Pr[X\geq(p+t)n]\leq e^{1-\frac{1}{64}t^{2}n}.
Proof.

We may assume that tβ‰₯8/n𝑑8𝑛t\geq 8/\sqrt{n}, since otherwise the lemma holds trivially. Set Ξ±=Pr⁑[Xβ‰₯(p+t)​n]𝛼Pr𝑋𝑝𝑑𝑛\alpha=\Pr[X\geq(p+t)n]. Let X(1),…,X(mβˆ’1)superscript𝑋1…superscriptπ‘‹π‘š1X^{(1)},\dots,X^{(m-1)} be mβˆ’1π‘š1m-1 independent copies of X𝑋X and let X(m)=𝐄​[X]superscriptπ‘‹π‘šπ„delimited-[]𝑋X^{(m)}=\mathbf{E}[X]. Then,

Pr⁑[max⁑{X(1),…,X(m)}β‰₯(p+t)​n]=1βˆ’(1βˆ’Ξ±)mβˆ’1β‰₯1βˆ’eβˆ’Ξ±β€‹(mβˆ’1).Prsuperscript𝑋1…superscriptπ‘‹π‘šπ‘π‘‘π‘›1superscript1π›Όπ‘š11superscriptπ‘’π›Όπ‘š1\Pr\big{[}\max\{X^{(1)},\dots,X^{(m)}\}\geq(p+t)n\big{]}=1-(1-\alpha)^{m-1}\geq 1-e^{-\alpha(m-1)}.(4)

On the other hand, Markov’s inequality gives

Pr⁑[max⁑{X(1),…,X(m)}β‰₯(p+t)​n]=Pr⁑[max⁑{X(1),…,X(m)}βˆ’p​nβ‰₯t​n]≀𝐄​[max⁑{X(1),…,X(m)}βˆ’p​n]t​n≀5​ln⁑mt​n,Prsuperscript𝑋1…superscriptπ‘‹π‘šπ‘π‘‘π‘›Prsuperscript𝑋1…superscriptπ‘‹π‘šπ‘π‘›π‘‘π‘›π„delimited-[]superscript𝑋1…superscriptπ‘‹π‘šπ‘π‘›π‘‘π‘›5π‘šπ‘‘π‘›\Pr\big{[}\max\{X^{(1)},\dots,X^{(m)}\}\geq(p+t)n\big{]}=\Pr\big{[}\max\{X^{(1)},\dots,X^{(m)}\}-pn\geq tn\big{]}\\ \leq\frac{\mathbf{E}\big{[}\max\{X^{(1)},\dots,X^{(m)}\}-pn\big{]}}{tn}\leq\frac{5\sqrt{\ln m}}{t\sqrt{n}},

by Lemma 3.2. Thus, setting m=exp⁑((eβˆ’15​e)2​t2​n)π‘šsuperscript𝑒15𝑒2superscript𝑑2𝑛m=\exp\Big{(}\big{(}\frac{e-1}{5e}\big{)}^{2}t^{2}n\Big{)}, and combining with (4), we get

eβˆ’1eβ‰₯1βˆ’eβˆ’Ξ±β€‹(mβˆ’1)⇔α≀1exp⁑((eβˆ’15​e)2​t2​n)βˆ’1≀1exp⁑(t2​n64)βˆ’1,⇔𝑒1𝑒1superscriptπ‘’π›Όπ‘š1𝛼1superscript𝑒15𝑒2superscript𝑑2𝑛11superscript𝑑2𝑛641\frac{e-1}{e}\geq 1-e^{-\alpha(m-1)}\Leftrightarrow\alpha\leq\frac{1}{\exp\Big{(}\big{(}\frac{e-1}{5e}\big{)}^{2}t^{2}n\Big{)}-1}\leq\frac{1}{\exp\big{(}\frac{t^{2}n}{64}\big{)}-1},

since (eβˆ’15​e)2β‰₯164superscript𝑒15𝑒2164\big{(}\frac{e-1}{5e}\big{)}^{2}\geq\frac{1}{64}. Now the lemma follows from

exp⁑(t2​n64)exp⁑(t2​n64)βˆ’1≀eeβˆ’1≀e,superscript𝑑2𝑛64superscript𝑑2𝑛641𝑒𝑒1𝑒\frac{\exp\big{(}\frac{t^{2}n}{64}\big{)}}{\exp\big{(}\frac{t^{2}n}{64}\big{)}-1}\leq\frac{e}{e-1}\leq e,

which holds as tβ‰₯8/n𝑑8𝑛t\geq 8/\sqrt{n}, as x↦x/(xβˆ’1)maps-toπ‘₯π‘₯π‘₯1x\mapsto x/(x-1) is decreasing for xβ‰₯0π‘₯0x\geq 0, and as eβ‰₯2𝑒2e\geq 2. ∎

It remains to prove Lemma 3.2. For this, we use an idea from differential privacy. Let A∈[0,1]mΓ—n𝐴superscript01π‘šπ‘›A\in[0,1]^{m\times n}, A=(ai​j)𝐴subscriptπ‘Žπ‘–π‘—A=(a_{ij}), be an (mΓ—n)π‘šπ‘›(m\times n)-matrix with entries from [0,1]01[0,1]. For a given parameter Ξ³>1𝛾1\gamma>1, we define a random variable Sγ​(A)subscript𝑆𝛾𝐴S_{\gamma}(A) with values in {1,…,m}1β€¦π‘š\{1,\dots,m\} as follows: for i=1,…,m𝑖1β€¦π‘ši=1,\dots,m, let bi=βˆ‘j=1,…,nai​jsubscript𝑏𝑖subscript𝑗1…𝑛subscriptπ‘Žπ‘–π‘—b_{i}=\sum_{j=1,\dots,n}a_{ij} be the sum of the entries in the i𝑖i-th row of A𝐴A. Set

Cγ​(A)=βˆ‘i=1mΞ³bi.subscript𝐢𝛾𝐴superscriptsubscript𝑖1π‘šsuperscript𝛾subscript𝑏𝑖C_{\gamma}(A)=\sum_{i=1}^{m}\gamma^{b_{i}}.

Then, for i=1,…,m𝑖1β€¦π‘ši=1,\dots,m, we define

Pr⁑[Sγ​(A)=i]=Ξ³biCγ​(A).Prsubscript𝑆𝛾𝐴𝑖superscript𝛾subscript𝑏𝑖subscript𝐢𝛾𝐴\Pr[S_{\gamma}(A)=i]=\frac{\gamma^{b_{i}}}{C_{\gamma}(A)}.

The random variable Sγ​(A)subscript𝑆𝛾𝐴S_{\gamma}(A) is called a stable selector for A𝐴A (see the work by McSherry and Talwar [18] for more background). The next lemma states two interesting properties for Sγ​(A)subscript𝑆𝛾𝐴S_{\gamma}(A). For a matrix A∈[0,1]mΓ—n𝐴superscript01π‘šπ‘›A\in[0,1]^{m\times n}, a vector cβ†’βˆˆ[0,1]m→𝑐superscript01π‘š\vec{c}\in[0,1]^{m}, and a number j∈{1,…,n}𝑗1…𝑛j\in\{1,\dots,n\} we denote by (Aβˆ’j,cβ†’)subscript𝐴𝑗→𝑐(A_{-j},\vec{c}) the matrix obtained from A𝐴A by replacing the j𝑗j-th column of A𝐴A with c→→𝑐\vec{c}.

Lemma 3.4.

Let A∈[0,1]mΓ—n𝐴superscript01π‘šπ‘›A\in[0,1]^{m\times n} be an mΓ—nπ‘šπ‘›m\times n matrix with entries in [0,1]01[0,1]. We have

  • β€’

    Stability: For every vector cβ†’βˆˆ[0,1]m→𝑐superscript01π‘š\vec{c}\in[0,1]^{m}and every i∈{1,…,m}𝑖1β€¦π‘ši\in\{1,\dots,m\},

    Ξ³βˆ’2​Pr⁑[Sγ​(Aβˆ’j,cβ†’)=i]≀Pr⁑[Sγ​(A)=i]≀γ2​Pr⁑[Sγ​(Aβˆ’j,cβ†’)=i].superscript𝛾2Prsubscript𝑆𝛾subscript𝐴𝑗→𝑐𝑖Prsubscript𝑆𝛾𝐴𝑖superscript𝛾2Prsubscript𝑆𝛾subscript𝐴𝑗→𝑐𝑖\gamma^{-2}\Pr[S_{\gamma}(A_{-j},\vec{c})=i]\leq\Pr[S_{\gamma}(A)=i]\leq\gamma^{2}\Pr[S_{\gamma}(A_{-j},\vec{c})=i].
  • β€’

    Accuracy: Let bisubscript𝑏𝑖b_{i}be the sum of the i𝑖i-th row of A𝐴A. Then,

    𝐄i∼Sγ​(A)​[bi]≀maxi=1m⁑bi≀𝐄i∼Sγ​(A)​[bi]+logγ⁑m.subscript𝐄similar-to𝑖subscript𝑆𝛾𝐴delimited-[]subscript𝑏𝑖superscriptsubscript𝑖1π‘šsubscript𝑏𝑖subscript𝐄similar-to𝑖subscript𝑆𝛾𝐴delimited-[]subscript𝑏𝑖subscriptπ›Ύπ‘š\mathbf{E}_{i\sim S_{\gamma}(A)}[b_{i}]\leq\max_{i=1}^{m}b_{i}\leq\mathbf{E}_{i\sim S_{\gamma}(A)}[b_{i}]+\log_{\gamma}m.
Proof.

Stability: for k∈{1,…,m}π‘˜1β€¦π‘šk\in\{1,\dots,m\}, let bksubscriptπ‘π‘˜b_{k} be the sum of the kπ‘˜k-th row of A𝐴A, and let b~ksubscript~π‘π‘˜\widetilde{b}_{k} be the sum of the kπ‘˜k-th row of (Aβˆ’j,c~)subscript𝐴𝑗~𝑐(A_{-j},\widetilde{c}). Since A𝐴A and (Aβˆ’j,c~)subscript𝐴𝑗~𝑐(A_{-j},\widetilde{c}) differ in one column, and since the entries are from [0,1]01[0,1], we have b~kβˆ’1≀bk≀b~k+1subscript~π‘π‘˜1subscriptπ‘π‘˜subscript~π‘π‘˜1\widetilde{b}_{k}-1\leq b_{k}\leq\widetilde{b}_{k}+1. Hence,

Ξ³βˆ’1​Cγ​(Aβˆ’j,cβ†’)≀Cγ​(A)≀γ​Cγ​(Aβˆ’j,cβ†’)superscript𝛾1subscript𝐢𝛾subscript𝐴𝑗→𝑐subscript𝐢𝛾𝐴𝛾subscript𝐢𝛾subscript𝐴𝑗→𝑐\gamma^{-1}C_{\gamma}(A_{-j},\vec{c})\leq C_{\gamma}(A)\leq\gamma C_{\gamma}(A_{-j},\vec{c})

and

Ξ³βˆ’2​Pr⁑[Sγ​(Aβˆ’j,cβ†’)=i]≀Pr⁑[Sγ​(A)=i]≀γ2​Pr⁑[Sγ​(Aβˆ’j,cβ†’)=i],superscript𝛾2Prsubscript𝑆𝛾subscript𝐴𝑗→𝑐𝑖Prsubscript𝑆𝛾𝐴𝑖superscript𝛾2Prsubscript𝑆𝛾subscript𝐴𝑗→𝑐𝑖\gamma^{-2}\Pr[S_{\gamma}(A_{-j},\vec{c})=i]\leq\Pr[S_{\gamma}(A)=i]\leq\gamma^{2}\Pr[S_{\gamma}(A_{-j},\vec{c})=i],

as claimed.

Accuracy: The inequality 𝐄i∼Sγ​(A)​[bi]≀maxi=1m⁑bisubscript𝐄similar-to𝑖subscript𝑆𝛾𝐴delimited-[]subscript𝑏𝑖superscriptsubscript𝑖1π‘šsubscript𝑏𝑖\mathbf{E}_{i\sim S_{\gamma}(A)}[b_{i}]\leq\max_{i=1}^{m}b_{i} is obvious. For the second inequality, we observe that by definition,

bi=logγ⁑(Cγ​(A)​Pr⁑[Sγ​(A)=i]).subscript𝑏𝑖subscript𝛾subscript𝐢𝛾𝐴Prsubscript𝑆𝛾𝐴𝑖b_{i}=\log_{\gamma}(C_{\gamma}(A)\Pr[S_{\gamma}(A)=i]).

Thus,

𝐄i∼Sγ​(A)​[bi]subscript𝐄similar-to𝑖subscript𝑆𝛾𝐴delimited-[]subscript𝑏𝑖\displaystyle\mathbf{E}_{i\sim S_{\gamma}(A)}[b_{i}]=βˆ‘i=1mPr⁑[Sγ​(A)=i]​logγ⁑(Cγ​(A)​Pr⁑[Sγ​(A)=i])absentsuperscriptsubscript𝑖1π‘šPrsubscript𝑆𝛾𝐴𝑖subscript𝛾subscript𝐢𝛾𝐴Prsubscript𝑆𝛾𝐴𝑖\displaystyle=\sum_{i=1}^{m}\Pr[S_{\gamma}(A)=i]\log_{\gamma}(C_{\gamma}(A)\Pr[S_{\gamma}(A)=i])
=βˆ‘i=1mPr⁑[Sγ​(A)=i]​logγ⁑Cγ​(A)βˆ’βˆ‘i=1mPr⁑[Sγ​(A)=i]​logγ⁑1Pr⁑[Sγ​(A)=i]absentsuperscriptsubscript𝑖1π‘šPrsubscript𝑆𝛾𝐴𝑖subscript𝛾subscript𝐢𝛾𝐴superscriptsubscript𝑖1π‘šPrsubscript𝑆𝛾𝐴𝑖subscript𝛾1Prsubscript𝑆𝛾𝐴𝑖\displaystyle=\sum_{i=1}^{m}\Pr[S_{\gamma}(A)=i]\log_{\gamma}C_{\gamma}(A)-\sum_{i=1}^{m}\Pr[S_{\gamma}(A)=i]\log_{\gamma}\frac{1}{\Pr[S_{\gamma}(A)=i]}
β‰₯βˆ‘i=1mPr⁑[Sγ​(A)=i]​logγ⁑γmaxi=1m⁑biβˆ’logγ⁑m,absentsuperscriptsubscript𝑖1π‘šPrsubscript𝑆𝛾𝐴𝑖subscript𝛾superscript𝛾superscriptsubscript𝑖1π‘šsubscript𝑏𝑖subscriptπ›Ύπ‘š\displaystyle\geq\sum_{i=1}^{m}\Pr[S_{\gamma}(A)=i]\log_{\gamma}\gamma^{\max_{i=1}^{m}b_{i}}-\log_{\gamma}m,
=maxi=1m⁑biβˆ’logγ⁑m,absentsuperscriptsubscript𝑖1π‘šsubscript𝑏𝑖subscriptπ›Ύπ‘š\displaystyle=\max_{i=1}^{m}b_{i}-\log_{\gamma}m,

since Cγ​(A)=βˆ‘i=1mΞ³biβ‰₯Ξ³maxi=1m⁑bisubscript𝐢𝛾𝐴superscriptsubscript𝑖1π‘šsuperscript𝛾subscript𝑏𝑖superscript𝛾superscriptsubscript𝑖1π‘šsubscript𝑏𝑖C_{\gamma}(A)=\sum_{i=1}^{m}\gamma^{b_{i}}\geq\gamma^{\max_{i=1}^{m}b_{i}} and since xβ†¦βˆ’logγ⁑(x)maps-toπ‘₯subscript𝛾π‘₯x\mapsto-\log_{\gamma}(x) is a convex function. ∎

Lemma 3.4 shows that Sγ​(A)subscript𝑆𝛾𝐴S_{\gamma}(A) constitutes a reasonable mechanism of estimating the maximum row sum of A𝐴A without revealing too much information about any single column of A𝐴A. We can now use Lemma 3.4 to bound the expectation of the maximum of mβˆ’1π‘š1m-1 independent copies of X𝑋X and 𝐄​[X]𝐄delimited-[]𝑋\mathbf{E}[X].

Lemma 3.5.

Let mβˆˆβ„•π‘šβ„•m\in\mathbb{N}. let X(1),…,X(mβˆ’1)superscript𝑋1…superscriptπ‘‹π‘š1X^{(1)},\dots,X^{(m-1)} be mβˆ’1π‘š1m-1 independent copies of X𝑋X, and set X(m)=𝐄​[X]superscriptπ‘‹π‘šπ„delimited-[]𝑋X^{(m)}=\mathbf{E}[X]. Then, for any Ξ³>1𝛾1\gamma>1, we have

𝐄​[max⁑{X(1),…,X(m)}]≀γ2​p​n+logγ⁑m.𝐄delimited-[]superscript𝑋1…superscriptπ‘‹π‘šsuperscript𝛾2𝑝𝑛subscriptπ›Ύπ‘š\mathbf{E}\big{[}\max\{X^{(1)},\dots,X^{(m)}\}\big{]}\leq\gamma^{2}pn+\log_{\gamma}m.
Proof.

Let X1(1),…,X1(mβˆ’1)superscriptsubscript𝑋11…superscriptsubscript𝑋1π‘š1X_{1}^{(1)},\dots,X_{1}^{(m-1)} be mβˆ’1π‘š1m-1 independent copies of X1subscript𝑋1X_{1}, and let X1(m)=𝐄​[X1]superscriptsubscript𝑋1π‘šπ„delimited-[]subscript𝑋1X_{1}^{(m)}=\mathbf{E}[X_{1}]; let X2(1),…,X2(mβˆ’1)superscriptsubscript𝑋21…superscriptsubscript𝑋2π‘š1X_{2}^{(1)},\dots,X_{2}^{(m-1)} be mβˆ’1π‘š1m-1 independent copies of X2subscript𝑋2X_{2} and let X2(m)=𝐄​[X2]superscriptsubscript𝑋2π‘šπ„delimited-[]subscript𝑋2X_{2}^{(m)}=\mathbf{E}[X_{2}]; and so on. We consider the random mΓ—nπ‘šπ‘›m\times n matrix M∈{0,1}mΓ—n𝑀superscript01π‘šπ‘›M\in\{0,1\}^{m\times n} whose entry in row i𝑖i and column j𝑗j is Xj(i)superscriptsubscript𝑋𝑗𝑖X_{j}^{(i)}. Then, we can write X(i)=βˆ‘j=1nXj(i)superscript𝑋𝑖superscriptsubscript𝑗1𝑛superscriptsubscript𝑋𝑗𝑖X^{(i)}=\sum_{j=1}^{n}X_{j}^{(i)}, for i=1,…,m𝑖1β€¦π‘ši=1,\dots,m. By the accuracy claim in Lemma 3.4,

𝐄M​[max⁑{X(1),…,X(m)}]≀𝐄M,i∼Sγ​(M)​[X(i)]+logγ⁑msubscript𝐄𝑀delimited-[]superscript𝑋1…superscriptπ‘‹π‘šsubscript𝐄similar-to𝑀𝑖subscript𝑆𝛾𝑀delimited-[]superscript𝑋𝑖subscriptπ›Ύπ‘š\mathbf{E}_{M}\big{[}\max\{X^{(1)},\dots,X^{(m)}\}\big{]}\leq\mathbf{E}_{M,i\sim S_{\gamma}(M)}\big{[}X^{(i)}\big{]}+\log_{\gamma}m(5)

Now we bound 𝐄M,i∼Sγ​(M)​[X(i)]subscript𝐄similar-to𝑀𝑖subscript𝑆𝛾𝑀delimited-[]superscript𝑋𝑖\mathbf{E}_{M,i\sim S_{\gamma}(M)}\big{[}X^{(i)}\big{]}. We unwrap the expectation for i∼Sγ​(M)similar-to𝑖subscript𝑆𝛾𝑀i\sim S_{\gamma}(M) and get

𝐄M,i∼Sγ​(M)​[X(i)]=𝐄M​[βˆ‘i=1mPr⁑[Sγ​(M)=i]​X(i)]subscript𝐄similar-to𝑀𝑖subscript𝑆𝛾𝑀delimited-[]superscript𝑋𝑖subscript𝐄𝑀delimited-[]superscriptsubscript𝑖1π‘šPrsubscript𝑆𝛾𝑀𝑖superscript𝑋𝑖\mathbf{E}_{M,i\sim S_{\gamma}(M)}[X^{(i)}]=\mathbf{E}_{M}\Big{[}\sum_{i=1}^{m}\Pr[S_{\gamma}(M)=i]X^{(i)}\Big{]}

Let M~~𝑀\widetilde{M} be an independent copy of M𝑀M. Denote the entry in the i𝑖i-th row and j𝑗j-th column of M~~𝑀\widetilde{M} by X~j(i)superscriptsubscript~𝑋𝑗𝑖\widetilde{X}_{j}^{(i)}, and set X~(i)=βˆ‘j=1nX~j(i)superscript~𝑋𝑖superscriptsubscript𝑗1𝑛superscriptsubscript~𝑋𝑗𝑖\widetilde{X}^{(i)}=\sum_{j=1}^{n}\widetilde{X}_{j}^{(i)}, for i=1,…,m𝑖1β€¦π‘ši=1,\dots,m. By the stability claim in Lemma 3.4, for every j∈{1,…,n}𝑗1…𝑛j\in\{1,\dots,n\},

𝐄M​[βˆ‘i=1mPr⁑[Sγ​(M)=i]​X(i)]subscript𝐄𝑀delimited-[]superscriptsubscript𝑖1π‘šPrsubscript𝑆𝛾𝑀𝑖superscript𝑋𝑖\displaystyle\mathbf{E}_{M}\Big{[}\sum_{i=1}^{m}\Pr\big{[}S_{\gamma}(M)=i\big{]}X^{(i)}\Big{]}≀γ2​𝐄M,M~​[βˆ‘i=1mPr⁑[Sγ​(Mβˆ’j,M~j)=i]​X(i)].absentsuperscript𝛾2subscript𝐄𝑀~𝑀delimited-[]superscriptsubscript𝑖1π‘šPrsubscript𝑆𝛾subscript𝑀𝑗subscript~𝑀𝑗𝑖superscript𝑋𝑖\displaystyle\leq\gamma^{2}\mathbf{E}_{M,\widetilde{M}}\Big{[}\sum_{i=1}^{m}\Pr\big{[}S_{\gamma}(M_{-j},\widetilde{M}_{j})=i\big{]}X^{(i)}\Big{]}.
Since the random variables Xj(i)superscriptsubscript𝑋𝑗𝑖X_{j}^{(i)}, X~j(i)superscriptsubscript~𝑋𝑗𝑖\widetilde{X}_{j}^{(i)}, 1≀i≀m1π‘–π‘š1\leq i\leq m, 1≀j≀n1𝑗𝑛1\leq j\leq n, are independent, the pairs ((Mβˆ’j,M~j),Xj(i))subscript𝑀𝑗subscript~𝑀𝑗superscriptsubscript𝑋𝑗𝑖\big{(}(M_{-j},\widetilde{M}_{j}),X_{j}^{(i)}\big{)} and (M,X~j(i))𝑀superscriptsubscript~𝑋𝑗𝑖\big{(}M,\widetilde{X}_{j}^{(i)}\big{)} have the same distribution. Therefore, we can write
𝐄M​[βˆ‘i=1mPr⁑[Sγ​(M)=i]​X(i)]subscript𝐄𝑀delimited-[]superscriptsubscript𝑖1π‘šPrsubscript𝑆𝛾𝑀𝑖superscript𝑋𝑖\displaystyle\mathbf{E}_{M}\Big{[}\sum_{i=1}^{m}\Pr\big{[}S_{\gamma}(M)=i\big{]}X^{(i)}\Big{]}=𝐄M​[βˆ‘i=1mβˆ‘j=1nPr⁑[Sγ​(M)=i]​Xj(i)]absentsubscript𝐄𝑀delimited-[]superscriptsubscript𝑖1π‘šsuperscriptsubscript𝑗1𝑛Prsubscript𝑆𝛾𝑀𝑖superscriptsubscript𝑋𝑗𝑖\displaystyle=\mathbf{E}_{M}\Big{[}\sum_{i=1}^{m}\sum_{j=1}^{n}\Pr\big{[}S_{\gamma}(M)=i\big{]}X_{j}^{(i)}\Big{]}
≀γ2​𝐄M,M~​[βˆ‘j=1nβˆ‘i=1mPr⁑[Sγ​(Mβˆ’j,M~j)=i]​Xj(i)]absentsuperscript𝛾2subscript𝐄𝑀~𝑀delimited-[]superscriptsubscript𝑗1𝑛superscriptsubscript𝑖1π‘šPrsubscript𝑆𝛾subscript𝑀𝑗subscript~𝑀𝑗𝑖superscriptsubscript𝑋𝑗𝑖\displaystyle\leq\gamma^{2}\mathbf{E}_{M,\widetilde{M}}\Big{[}\sum_{j=1}^{n}\sum_{i=1}^{m}\Pr\big{[}S_{\gamma}(M_{-j},\widetilde{M}_{j})=i\big{]}X_{j}^{(i)}\Big{]}
=Ξ³2​𝐄M,M~​[βˆ‘j=1nβˆ‘i=1mPr⁑[Sγ​(M)=i]​X~j(i)]absentsuperscript𝛾2subscript𝐄𝑀~𝑀delimited-[]superscriptsubscript𝑗1𝑛superscriptsubscript𝑖1π‘šPrsubscript𝑆𝛾𝑀𝑖superscriptsubscript~𝑋𝑗𝑖\displaystyle=\gamma^{2}\mathbf{E}_{M,\widetilde{M}}\Big{[}\sum_{j=1}^{n}\sum_{i=1}^{m}\Pr\big{[}S_{\gamma}(M)=i\big{]}\widetilde{X}_{j}^{(i)}\Big{]}
=Ξ³2​𝐄M​[βˆ‘i=1mPr⁑[Sγ​(M)=i]​𝐄M~​[X~(i)]]absentsuperscript𝛾2subscript𝐄𝑀delimited-[]superscriptsubscript𝑖1π‘šPrsubscript𝑆𝛾𝑀𝑖subscript𝐄~𝑀delimited-[]superscript~𝑋𝑖\displaystyle=\gamma^{2}\mathbf{E}_{M}\Big{[}\sum_{i=1}^{m}\Pr\big{[}S_{\gamma}(M)=i\big{]}\mathbf{E}_{\widetilde{M}}\big{[}\widetilde{X}^{(i)}\big{]}\Big{]}
=Ξ³2​𝐄M​[βˆ‘i=1mPr⁑[Sγ​(M)=i]​p​n]=Ξ³2​p​n.absentsuperscript𝛾2subscript𝐄𝑀delimited-[]superscriptsubscript𝑖1π‘šPrsubscript𝑆𝛾𝑀𝑖𝑝𝑛superscript𝛾2𝑝𝑛\displaystyle=\gamma^{2}\mathbf{E}_{M}\Big{[}\sum_{i=1}^{m}\Pr\big{[}S_{\gamma}(M)=i\big{]}pn\Big{]}=\gamma^{2}pn.

We can conclude the lemma by plugging this bound into (5). ∎

To obtain Lemma 3.2, we set Ξ³=1+ln⁑mn𝛾1π‘šπ‘›\gamma=1+\frac{\sqrt{\ln m}}{\sqrt{n}}. Now, Lemma 3.5 gives

𝐄​[max⁑{X(1),…,X(m)}]𝐄delimited-[]superscript𝑋1…superscriptπ‘‹π‘š\displaystyle\mathbf{E}\big{[}\max\{X^{(1)},\dots,X^{(m)}\}\big{]}≀(1+ln⁑mn)2​p​n+ln⁑mln⁑(1+ln⁑mn)absentsuperscript1π‘šπ‘›2π‘π‘›π‘š1π‘šπ‘›\displaystyle\leq\left(1+\frac{\sqrt{\ln m}}{\sqrt{n}}\right)^{2}pn+\frac{\ln m}{\ln\left(1+\frac{\sqrt{\ln m}}{\sqrt{n}}\right)}
≀(1+3​ln⁑mn)​p​n+ln⁑mln⁑m2​n,absent13π‘šπ‘›π‘π‘›π‘šπ‘š2𝑛\displaystyle\leq\left(1+\frac{3\sqrt{\ln m}}{\sqrt{n}}\right)pn+\frac{\ln m}{\frac{\sqrt{\ln m}}{2\sqrt{n}}},
since ln⁑mn≀1π‘šπ‘›1\frac{\sqrt{\ln m}}{\sqrt{n}}\leq 1 by our assumption m≀enπ‘šsuperscript𝑒𝑛m\leq e^{n} and ln⁑(1+x)β‰₯x/21π‘₯π‘₯2\ln(1+x)\geq x/2, for x∈[0,1]π‘₯01x\in[0,1]. Hence, using p​n≀n𝑝𝑛𝑛pn\leq n,
𝐄​[max⁑{X(1),…,X(m)}]𝐄delimited-[]superscript𝑋1…superscriptπ‘‹π‘š\displaystyle\mathbf{E}\big{[}\max\{X^{(1)},\dots,X^{(m)}\}\big{]}≀p​n+5​n​ln⁑m,absent𝑝𝑛5π‘›π‘š\displaystyle\leq pn+5\sqrt{n\ln m},

as desired.

4 Useful Consequences

We now show several useful consequences of Theorem 2.1. These results can be derived directly from Theorem 2.1, and therefore they also hold for variants of the theorem with slightly different assumptions.

4.1 The Lower Tail

First, we show that an analogous bound holds for the lower tail probability Pr⁑[X≀(pβˆ’t)​n]Pr𝑋𝑝𝑑𝑛\Pr[X\leq(p-t)n].

Corollary 4.1.

Let X1,…,Xnsubscript𝑋1…subscript𝑋𝑛X_{1},\dots,X_{n} be independent random variables with Xi∈{0,1}subscript𝑋𝑖01X_{i}\in\{0,1\} and Pr⁑[Xi=1]=pPrsubscript𝑋𝑖1𝑝\Pr[X_{i}=1]=p, for i=1,…​n𝑖1…𝑛i=1,\dots n. Set X:=βˆ‘i=1nXiassign𝑋superscriptsubscript𝑖1𝑛subscript𝑋𝑖X:=\sum_{i=1}^{n}X_{i}. Then, for any t∈[0,p]𝑑0𝑝t\in[0,p], we have

Pr⁑[X≀(pβˆ’t)​n]≀eβˆ’DKL​(pβˆ’tβˆ₯p)​n.Pr𝑋𝑝𝑑𝑛superscript𝑒subscript𝐷KL𝑝conditional𝑑𝑝𝑛\Pr[X\leq(p-t)n]\leq e^{-D_{\textup{KL}}(p-t\|p)n}.
Proof.
Pr⁑[X≀(pβˆ’t)​n]=Pr⁑[nβˆ’Xβ‰₯nβˆ’(pβˆ’t)​n]=Pr⁑[Xβ€²β‰₯(1βˆ’p+t)​n],Pr𝑋𝑝𝑑𝑛Pr𝑛𝑋𝑛𝑝𝑑𝑛Prsuperscript𝑋′1𝑝𝑑𝑛\displaystyle\Pr[X\leq(p-t)n]=\Pr[n-X\geq n-(p-t)n]=\Pr[X^{\prime}\geq(1-p+t)n],

where Xβ€²=βˆ‘i=1nXiβ€²superscript𝑋′superscriptsubscript𝑖1𝑛superscriptsubscript𝑋𝑖′X^{\prime}=\sum_{i=1}^{n}X_{i}^{\prime} with independent random variables Xiβ€²βˆˆ{0,1}superscriptsubscript𝑋𝑖′01X_{i}^{\prime}\in\{0,1\} such that Pr⁑[Xiβ€²=1]=1βˆ’pPrsuperscriptsubscript𝑋𝑖′11𝑝\Pr[X_{i}^{\prime}=1]=1-p. The result follows from DKL​(1βˆ’p+tβˆ₯1βˆ’p)=DKL​(pβˆ’tβˆ₯p)subscript𝐷KL1𝑝conditional𝑑1𝑝subscript𝐷KL𝑝conditional𝑑𝑝D_{\textup{KL}}(1-p+t\|1-p)=D_{\textup{KL}}(p-t\|p). ∎

4.2 Multiplicative Version

Next, we derive a multiplicative variant of Theorem 2.1. This well-known version of the bound can be found in the classic text by Motwani and Raghavan [21].

Corollary 4.2.

Let X1,…,Xnsubscript𝑋1…subscript𝑋𝑛X_{1},\dots,X_{n} be independent random variables with Xi∈{0,1}subscript𝑋𝑖01X_{i}\in\{0,1\} and Pr⁑[Xi=1]=pPrsubscript𝑋𝑖1𝑝\Pr[X_{i}=1]=p, for i=1,…​n𝑖1…𝑛i=1,\dots n. Set X:=βˆ‘i=1nXiassign𝑋superscriptsubscript𝑖1𝑛subscript𝑋𝑖X:=\sum_{i=1}^{n}X_{i} and ΞΌ=p​nπœ‡π‘π‘›\mu=pn. Then, for any Ξ΄β‰₯0𝛿0\delta\geq 0, we have

Pr⁑[Xβ‰₯(1+Ξ΄)​μ]Pr𝑋1π›Ώπœ‡\displaystyle\Pr[X\geq(1+\delta)\mu]≀(eΞ΄(1+Ξ΄)1+Ξ΄)ΞΌ, andabsentsuperscriptsuperscript𝑒𝛿superscript1𝛿1π›Ώπœ‡ and\displaystyle\leq\left(\frac{e^{\delta}}{(1+\delta)^{1+\delta}}\right)^{\mu},\text{ and}
Pr⁑[X≀(1βˆ’Ξ΄)​μ]Pr𝑋1π›Ώπœ‡\displaystyle\Pr[X\leq(1-\delta)\mu]≀(eβˆ’Ξ΄(1βˆ’Ξ΄)1βˆ’Ξ΄)ΞΌ.absentsuperscriptsuperscript𝑒𝛿superscript1𝛿1π›Ώπœ‡\displaystyle\leq\left(\frac{e^{-\delta}}{(1-\delta)^{1-\delta}}\right)^{\mu}.
Proof.

Setting t=δ​μ/nπ‘‘π›Ώπœ‡π‘›t=\delta\mu/n in Theorem 2.1 yields

Pr⁑[Xβ‰₯(1+Ξ΄)​μ]Pr𝑋1π›Ώπœ‡\displaystyle\Pr[X\geq(1+\delta)\mu]≀exp⁑(βˆ’n​[p​(1+Ξ΄)​ln⁑(1+Ξ΄)+p​(1βˆ’ppβˆ’Ξ΄)​ln⁑(1βˆ’Ξ΄β€‹p1βˆ’p)])absent𝑛delimited-[]𝑝1𝛿1𝛿𝑝1𝑝𝑝𝛿1𝛿𝑝1𝑝\displaystyle\leq\exp\left(-n\left[p(1+\delta)\ln(1+\delta)+p\left(\frac{1-p}{p}-\delta\right)\ln\left(1-\delta\frac{p}{1-p}\right)\right]\right)
=((1βˆ’Ξ΄β€‹p/(1βˆ’p))Ξ΄βˆ’(1βˆ’p)/p(1+Ξ΄)1+Ξ΄)ΞΌabsentsuperscriptsuperscript1𝛿𝑝1𝑝𝛿1𝑝𝑝superscript1𝛿1π›Ώπœ‡\displaystyle=\left(\frac{(1-\delta p/(1-p))^{\delta-(1-p)/p}}{(1+\delta)^{1+\delta}}\right)^{\mu}
≀(eβˆ’Ξ΄2​p/(1βˆ’p)+Ξ΄(1+Ξ΄)1+Ξ΄)μ≀(eΞ΄(1+Ξ΄)1+Ξ΄)ΞΌ.absentsuperscriptsuperscript𝑒superscript𝛿2𝑝1𝑝𝛿superscript1𝛿1π›Ώπœ‡superscriptsuperscript𝑒𝛿superscript1𝛿1π›Ώπœ‡\displaystyle\leq\left(\frac{e^{-\delta^{2}p/(1-p)+\delta}}{(1+\delta)^{1+\delta}}\right)^{\mu}\leq\left(\frac{e^{\delta}}{(1+\delta)^{1+\delta}}\right)^{\mu}.

Setting t=δ​μ/nπ‘‘π›Ώπœ‡π‘›t=\delta\mu/n in Corollary 4.1 yields

Pr⁑[X≀(1βˆ’Ξ΄)​μ]Pr𝑋1π›Ώπœ‡\displaystyle\Pr[X\leq(1-\delta)\mu]≀exp⁑(βˆ’n​[p​(1βˆ’Ξ΄)​ln⁑(1βˆ’Ξ΄)+p​(1βˆ’pp+Ξ΄)​ln⁑(1+δ​p1βˆ’p)])absent𝑛delimited-[]𝑝1𝛿1𝛿𝑝1𝑝𝑝𝛿1𝛿𝑝1𝑝\displaystyle\leq\exp\left(-n\left[p(1-\delta)\ln(1-\delta)+p\left(\frac{1-p}{p}+\delta\right)\ln\left(1+\delta\frac{p}{1-p}\right)\right]\right)
=((1+δ​p/(1βˆ’p))βˆ’Ξ΄βˆ’(1βˆ’p)/p(1βˆ’Ξ΄)1βˆ’Ξ΄)ΞΌabsentsuperscriptsuperscript1𝛿𝑝1𝑝𝛿1𝑝𝑝superscript1𝛿1π›Ώπœ‡\displaystyle=\left(\frac{(1+\delta p/(1-p))^{-\delta-(1-p)/p}}{(1-\delta)^{1-\delta}}\right)^{\mu}
≀(eβˆ’Ξ΄2​p/(1βˆ’p)βˆ’Ξ΄(1βˆ’Ξ΄)1βˆ’Ξ΄)μ≀(eβˆ’Ξ΄(1βˆ’Ξ΄)1βˆ’Ξ΄)ΞΌ.absentsuperscriptsuperscript𝑒superscript𝛿2𝑝1𝑝𝛿superscript1𝛿1π›Ώπœ‡superscriptsuperscript𝑒𝛿superscript1𝛿1π›Ώπœ‡\displaystyle\leq\left(\frac{e^{-\delta^{2}p/(1-p)-\delta}}{(1-\delta)^{1-\delta}}\right)^{\mu}\leq\left(\frac{e^{-\delta}}{(1-\delta)^{1-\delta}}\right)^{\mu}.

∎

4.3 Useful Variants

The next few corollaries give some handy variants of the bound that are often more manageable in practice. First, we give a simple bound for the multiplicative lower tail.

Corollary 4.3.

Let X1,…,Xnsubscript𝑋1…subscript𝑋𝑛X_{1},\dots,X_{n} be independent random variables with Xi∈{0,1}subscript𝑋𝑖01X_{i}\in\{0,1\} and Pr⁑[Xi=1]=pPrsubscript𝑋𝑖1𝑝\Pr[X_{i}=1]=p, for i=1,…​n𝑖1…𝑛i=1,\dots n. Set X:=βˆ‘i=1nXiassign𝑋superscriptsubscript𝑖1𝑛subscript𝑋𝑖X:=\sum_{i=1}^{n}X_{i} and ΞΌ=p​nπœ‡π‘π‘›\mu=pn. Then, for any δ∈(0,1)𝛿01\delta\in(0,1), we have

Pr⁑[X≀(1βˆ’Ξ΄)​μ]≀eβˆ’Ξ΄2​μ/2.Pr𝑋1π›Ώπœ‡superscript𝑒superscript𝛿2πœ‡2\Pr[X\leq(1-\delta)\mu]\leq e^{-\delta^{2}\mu/2}.
Proof.

By Corollary 4.2

Pr⁑[X≀(1βˆ’Ξ΄)​μ]≀(eβˆ’Ξ΄(1βˆ’Ξ΄)1βˆ’Ξ΄)ΞΌ.Pr𝑋1π›Ώπœ‡superscriptsuperscript𝑒𝛿superscript1𝛿1π›Ώπœ‡\Pr[X\leq(1-\delta)\mu]\leq\left(\frac{e^{-\delta}}{(1-\delta)^{1-\delta}}\right)^{\mu}.

Using the power series expansion of ln⁑(1βˆ’Ξ΄)1𝛿\ln(1-\delta), we get

(1βˆ’Ξ΄)​ln⁑(1βˆ’Ξ΄)=βˆ’(1βˆ’Ξ΄)β€‹βˆ‘i=1∞δii=βˆ’Ξ΄+βˆ‘i=2∞δi(iβˆ’1)​iβ‰₯βˆ’Ξ΄+Ξ΄2/2.1𝛿1𝛿1𝛿superscriptsubscript𝑖1superscript𝛿𝑖𝑖𝛿superscriptsubscript𝑖2superscript𝛿𝑖𝑖1𝑖𝛿superscript𝛿22(1-\delta)\ln(1-\delta)=-(1-\delta)\sum_{i=1}^{\infty}\frac{\delta^{i}}{i}=-\delta+\sum_{i=2}^{\infty}\frac{\delta^{i}}{(i-1)i}\geq-\delta+\delta^{2}/2.

Thus,

Pr⁑[X≀(1βˆ’Ξ΄)​μ]≀e[βˆ’Ξ΄+Ξ΄βˆ’Ξ΄2/2]​μ=eβˆ’Ξ΄2​μ/2,Pr𝑋1π›Ώπœ‡superscript𝑒delimited-[]𝛿𝛿superscript𝛿22πœ‡superscript𝑒superscript𝛿2πœ‡2\Pr[X\leq(1-\delta)\mu]\leq e^{[-\delta+\delta-\delta^{2}/2]\mu}=e^{-\delta^{2}\mu/2},

as claimed. ∎

An only slightly more complicated bound can be found for the multiplicative upper tail.

Corollary 4.4.

Let X1,…,Xnsubscript𝑋1…subscript𝑋𝑛X_{1},\dots,X_{n} be independent random variables with Xi∈{0,1}subscript𝑋𝑖01X_{i}\in\{0,1\} and Pr⁑[Xi=1]=pPrsubscript𝑋𝑖1𝑝\Pr[X_{i}=1]=p, for i=1,…​n𝑖1…𝑛i=1,\dots n. Set X:=βˆ‘i=1nXiassign𝑋superscriptsubscript𝑖1𝑛subscript𝑋𝑖X:=\sum_{i=1}^{n}X_{i} and ΞΌ=p​nπœ‡π‘π‘›\mu=pn. Then, for any Ξ΄β‰₯0𝛿0\delta\geq 0, we have

Pr⁑[Xβ‰₯(1+Ξ΄)​μ]≀eβˆ’min⁑{Ξ΄2,Ξ΄}​μ/4.Pr𝑋1π›Ώπœ‡superscript𝑒superscript𝛿2π›Ώπœ‡4\Pr[X\geq(1+\delta)\mu]\leq e^{-\min\{\delta^{2},\delta\}\mu/4}.
Proof.

We may assume that (1+Ξ΄)​p≀11𝛿𝑝1(1+\delta)p\leq 1. Then, Theorem 2.1 gives

Pr⁑[Xβ‰₯(1+Ξ΄)​p​n]≀eβˆ’DKL​((1+Ξ΄)​pβˆ₯p)​n.Pr𝑋1𝛿𝑝𝑛superscript𝑒subscript𝐷KLconditional1𝛿𝑝𝑝𝑛\Pr[X\geq(1+\delta)pn]\leq e^{-D_{\textup{KL}}((1+\delta)p\|p)n}.

Define f​(Ξ΄):=DKL​((1+Ξ΄)​pβˆ₯p)assign𝑓𝛿subscript𝐷KLconditional1𝛿𝑝𝑝f(\delta):=D_{\textup{KL}}((1+\delta)p\|p). Then,

f′​(Ξ΄)=p​ln⁑(1+Ξ΄)βˆ’p​ln⁑(1βˆ’Ξ΄β€‹p/(1βˆ’p))superscript𝑓′𝛿𝑝1𝛿𝑝1𝛿𝑝1𝑝f^{\prime}(\delta)=p\ln(1+\delta)-p\ln(1-\delta p/(1-p))

and

f′′​(Ξ΄)=p(1+Ξ΄)​(1βˆ’pβˆ’Ξ΄β€‹p)β‰₯p1+Ξ΄.superscript𝑓′′𝛿𝑝1𝛿1𝑝𝛿𝑝𝑝1𝛿f^{\prime\prime}(\delta)=\frac{p}{(1+\delta)(1-p-\delta p)}\geq\frac{p}{1+\delta}.

By Taylor’s theorem, we have

f​(Ξ΄)=f​(0)+δ​f′​(0)+Ξ΄22​f′′​(ΞΎ),𝑓𝛿𝑓0𝛿superscript𝑓′0superscript𝛿22superscriptπ‘“β€²β€²πœ‰f(\delta)=f(0)+\delta f^{\prime}(0)+\frac{\delta^{2}}{2}f^{\prime\prime}(\xi),

for some ξ∈[0,Ξ΄]πœ‰0𝛿\xi\in[0,\delta]. Since f​(0)=f′​(0)=0𝑓0superscript𝑓′00f(0)=f^{\prime}(0)=0, it follows that

f​(Ξ΄)=Ξ΄22​f′′​(ΞΎ)β‰₯Ξ΄2​p2​(1+ΞΎ)β‰₯Ξ΄2​p2​(1+Ξ΄).𝑓𝛿superscript𝛿22superscriptπ‘“β€²β€²πœ‰superscript𝛿2𝑝21πœ‰superscript𝛿2𝑝21𝛿f(\delta)=\frac{\delta^{2}}{2}f^{\prime\prime}(\xi)\geq\frac{\delta^{2}p}{2(1+\xi)}\geq\frac{\delta^{2}p}{2(1+\delta)}.

For Ξ΄β‰₯1𝛿1\delta\geq 1, we have Ξ΄/(1+Ξ΄)β‰₯1/2𝛿1𝛿12\delta/(1+\delta)\geq 1/2, for Ξ΄<1𝛿1\delta<1, we have 1/(Ξ΄+1)β‰₯1/21𝛿1121/(\delta+1)\geq 1/2. This gives, for all Ξ΄β‰₯0𝛿0\delta\geq 0,

f​(Ξ΄)β‰₯min⁑{Ξ΄2,Ξ΄}​p/4,𝑓𝛿superscript𝛿2𝛿𝑝4f(\delta)\geq\min\{\delta^{2},\delta\}p/4,

and the claim follows. ∎

The following corollary combines the two bounds. This variant can be found, e.g., in the book by Arora and Barak [2].

Corollary 4.5.

Let X1,…,Xnsubscript𝑋1…subscript𝑋𝑛X_{1},\dots,X_{n} be independent random variables with Xi∈{0,1}subscript𝑋𝑖01X_{i}\in\{0,1\} and Pr⁑[Xi=1]=pPrsubscript𝑋𝑖1𝑝\Pr[X_{i}=1]=p, for i=1,…​n𝑖1…𝑛i=1,\dots n. Set X:=βˆ‘i=1nXiassign𝑋superscriptsubscript𝑖1𝑛subscript𝑋𝑖X:=\sum_{i=1}^{n}X_{i} and ΞΌ=p​nπœ‡π‘π‘›\mu=pn. Then, for any Ξ΄>0𝛿0\delta>0, we have

Pr⁑[|Xβˆ’ΞΌ|β‰₯δ​μ]≀2​eβˆ’min⁑{Ξ΄2,Ξ΄}​μ/4.Prπ‘‹πœ‡π›Ώπœ‡2superscript𝑒superscript𝛿2π›Ώπœ‡4\Pr[|X-\mu|\geq\delta\mu]\leq 2e^{-\min\{\delta^{2},\delta\}\mu/4}.
Proof.

Combine Corollaries 4.3 and 4.4. ∎

The following corollary, which appears, e.g., in the book by Motwani and Raghavan [21], is also sometimes useful.

Corollary 4.6.

Let X1,…,Xnsubscript𝑋1…subscript𝑋𝑛X_{1},\dots,X_{n} be independent random variables with Xi∈{0,1}subscript𝑋𝑖01X_{i}\in\{0,1\} and Pr⁑[Xi=1]=pPrsubscript𝑋𝑖1𝑝\Pr[X_{i}=1]=p, for i=1,…​n𝑖1…𝑛i=1,\dots n. Set X:=βˆ‘i=1nXiassign𝑋superscriptsubscript𝑖1𝑛subscript𝑋𝑖X:=\sum_{i=1}^{n}X_{i} and ΞΌ=p​nπœ‡π‘π‘›\mu=pn. For tβ‰₯2​e​μ𝑑2π‘’πœ‡t\geq 2e\mu, we have

Pr⁑[Xβ‰₯t]≀2βˆ’t.Pr𝑋𝑑superscript2𝑑\Pr[X\geq t]\leq 2^{-t}.
Proof.

By Corollary 4.2

Pr⁑[Xβ‰₯(1+Ξ΄)​μ]≀(eΞ΄(1+Ξ΄)1+Ξ΄)μ≀(e1+Ξ΄)(1+Ξ΄)​μ.Pr𝑋1π›Ώπœ‡superscriptsuperscript𝑒𝛿superscript1𝛿1π›Ώπœ‡superscript𝑒1𝛿1π›Ώπœ‡\Pr[X\geq(1+\delta)\mu]\leq\left(\frac{e^{\delta}}{(1+\delta)^{1+\delta}}\right)^{\mu}\leq\left(\frac{e}{1+\delta}\right)^{(1+\delta)\mu}.

For Ξ΄β‰₯2​eβˆ’1𝛿2𝑒1\delta\geq 2e-1, the denominator in the right hand side is at least 2​e2𝑒2e, and the claim follows. ∎

5 Generalizations

We mention a few generalizations of the proof techniques for Section 3. Since the consequences from Section 4 are based on simple algebraic manipulation of the bounds, the same consequences also hold for the generalized settings.

5.1 Hoeffding Extension

The moment method (Section 3.1) yields many generalizations of Theorem 2.1. The following result is known as Hoeffding’s extension [14]. It shows that the Xisubscript𝑋𝑖X_{i} can actually be chosen to be continuous with varying expectations.

Theorem 5.1.

Let X1,…,Xnsubscript𝑋1…subscript𝑋𝑛X_{1},\dots,X_{n} be independent random variables with Xi∈[0,1]subscript𝑋𝑖01X_{i}\in[0,1] and 𝐄​[Xi]=pi𝐄delimited-[]subscript𝑋𝑖subscript𝑝𝑖\mathbf{E}[X_{i}]=p_{i}. Set X:=βˆ‘i=1nXiassign𝑋superscriptsubscript𝑖1𝑛subscript𝑋𝑖X:=\sum_{i=1}^{n}X_{i} and p:=(1/n)β€‹βˆ‘i=1npiassign𝑝1𝑛superscriptsubscript𝑖1𝑛subscript𝑝𝑖p:=(1/n)\sum_{i=1}^{n}p_{i}. Then, for any t∈[0,1βˆ’p]𝑑01𝑝t\in[0,1-p], we have

Pr⁑[Xβ‰₯(p+t)​n]≀eβˆ’DKL​(p+tβˆ₯p)​n.Pr𝑋𝑝𝑑𝑛superscript𝑒subscript𝐷KL𝑝conditional𝑑𝑝𝑛\Pr[X\geq(p+t)n]\leq e^{-D_{\textup{KL}}(p+t\|p)n}.
Proof.

Let Ξ»>0πœ†0\lambda>0 a parameter to be determined later. As before, Markov’s inequality yields

Pr⁑[eλ​Xβ‰₯eλ​(p+t)​n]≀𝐄​[eλ​X]eλ​(p+t)​n.Prsuperscriptπ‘’πœ†π‘‹superscriptπ‘’πœ†π‘π‘‘π‘›π„delimited-[]superscriptπ‘’πœ†π‘‹superscriptπ‘’πœ†π‘π‘‘π‘›\Pr\bigl{[}e^{\lambda X}\geq e^{\lambda(p+t)n}\bigr{]}\leq\frac{\mathbf{E}[e^{\lambda X}]}{e^{\lambda(p+t)n}}.

Using independence, we get

𝐄​[eλ​X]=𝐄​[eΞ»β€‹βˆ‘i=1nXi]=∏i=1n𝐄​[eλ​Xi].𝐄delimited-[]superscriptπ‘’πœ†π‘‹π„delimited-[]superscriptπ‘’πœ†superscriptsubscript𝑖1𝑛subscript𝑋𝑖superscriptsubscriptproduct𝑖1𝑛𝐄delimited-[]superscriptπ‘’πœ†subscript𝑋𝑖\mathbf{E}[e^{\lambda X}]=\mathbf{E}\Bigl{[}e^{\lambda\sum_{i=1}^{n}X_{i}}\Bigr{]}=\prod_{i=1}^{n}\mathbf{E}\Bigl{[}e^{\lambda X_{i}}\Bigr{]}.(6)

Now we need to estimate 𝐄​[eλ​Xi]𝐄delimited-[]superscriptπ‘’πœ†subscript𝑋𝑖\mathbf{E}\bigl{[}e^{\lambda X_{i}}\bigr{]}. The function z↦eλ​zmaps-to𝑧superscriptπ‘’πœ†π‘§z\mapsto e^{\lambda z} is convex, so eλ​z≀(1βˆ’z)​e0β‹…Ξ»+z​e1β‹…Ξ»superscriptπ‘’πœ†π‘§1𝑧superscript𝑒⋅0πœ†π‘§superscript𝑒⋅1πœ†e^{\lambda z}\leq(1-z)e^{0\cdot\lambda}+ze^{1\cdot\lambda} for z∈[0,1]𝑧01z\in[0,1]. Hence,

𝐄​[eλ​Xi]≀𝐄​[1βˆ’Xi+Xi​eΞ»]=1βˆ’pi+pi​eΞ».𝐄delimited-[]superscriptπ‘’πœ†subscript𝑋𝑖𝐄delimited-[]1subscript𝑋𝑖subscript𝑋𝑖superscriptπ‘’πœ†1subscript𝑝𝑖subscript𝑝𝑖superscriptπ‘’πœ†\mathbf{E}\bigl{[}e^{\lambda X_{i}}\bigr{]}\leq\mathbf{E}[1-X_{i}+X_{i}e^{\lambda}]=1-p_{i}+p_{i}e^{\lambda}.

Going back to (6),

𝐄​[eλ​X]β‰€βˆi=1n(1βˆ’pi+pi​eΞ»).𝐄delimited-[]superscriptπ‘’πœ†π‘‹superscriptsubscriptproduct𝑖1𝑛1subscript𝑝𝑖subscript𝑝𝑖superscriptπ‘’πœ†\mathbf{E}[e^{\lambda X}]\leq\prod_{i=1}^{n}(1-p_{i}+p_{i}e^{\lambda}).

Using the arithmetic-geometric mean inequality ∏i=1nxi≀((1/n)β€‹βˆ‘i=1nxi)nsuperscriptsubscriptproduct𝑖1𝑛subscriptπ‘₯𝑖superscript1𝑛superscriptsubscript𝑖1𝑛subscriptπ‘₯𝑖𝑛\prod_{i=1}^{n}x_{i}\leq\bigl{(}(1/n)\sum_{i=1}^{n}x_{i}\bigr{)}^{n}, for xiβ‰₯0subscriptπ‘₯𝑖0x_{i}\geq 0, this is

𝐄​[eλ​X]≀(1βˆ’p+p​eΞ»)n.𝐄delimited-[]superscriptπ‘’πœ†π‘‹superscript1𝑝𝑝superscriptπ‘’πœ†π‘›\mathbf{E}[e^{\lambda X}]\leq(1-p+pe^{\lambda})^{n}.

From here we continue as in Section 3.1. ∎

5.2 Hypergeometric Distribution

ChvΓ‘tals proof [7] from Section 3.2 generalizes to the hypergeometric distribution. We emphasize once again that this means that all the corollaries from Section 4 also apply to this case.

Theorem 5.2.

Suppose we have an urn with N𝑁N balls, P𝑃P of which are red. We randomly draw n𝑛n balls from the urn without replacement. Let H​(N,P,n)𝐻𝑁𝑃𝑛H(N,P,n) denote the number of red balls in the sample. Set p:=P/Nassign𝑝𝑃𝑁p:=P/N. Then, for any t∈[0,1βˆ’p]𝑑01𝑝t\in[0,1-p], we have

Pr⁑[H​(N,P,n)β‰₯(p+t)​n]≀eβˆ’DKL​(p+tβˆ₯p)​n.Pr𝐻𝑁𝑃𝑛𝑝𝑑𝑛superscript𝑒subscript𝐷KL𝑝conditional𝑑𝑝𝑛\Pr\big{[}H(N,P,n)\geq(p+t)n\big{]}\leq e^{-D_{\textup{KL}}(p+t\|p)n}.
Proof.

It is well known that

Pr⁑[H​(N,P,n)=l]=(Pl)​(Nβˆ’pnβˆ’l)​(Nl)βˆ’1,Pr𝐻𝑁𝑃𝑛𝑙binomial𝑃𝑙binomial𝑁𝑝𝑛𝑙superscriptbinomial𝑁𝑙1\Pr[H(N,P,n)=l]=\binom{P}{l}\binom{N-p}{n-l}\binom{N}{l}^{-1},

for l=0,…,n𝑙0…𝑛l=0,\dots,n.

Claim 5.3.

For every j∈{0,…,n}𝑗0…𝑛j\in\{0,\dots,n\}, we have

(Nn)βˆ’1β€‹βˆ‘i=jn(Pi)​(Nβˆ’Pnβˆ’i)​(ij)≀(nj)​pj.superscriptbinomial𝑁𝑛1superscriptsubscript𝑖𝑗𝑛binomial𝑃𝑖binomial𝑁𝑃𝑛𝑖binomial𝑖𝑗binomial𝑛𝑗superscript𝑝𝑗\binom{N}{n}^{-1}\sum_{i=j}^{n}\binom{P}{i}\binom{N-P}{n-i}\binom{i}{j}\leq\binom{n}{j}p^{j}.
Proof.

Consider the following random experiment: take a random permutation of the N𝑁N balls in the urn. Let S𝑆S be the sequence of the first n𝑛n elements in the permutation. Let X𝑋X be the number of j𝑗j-subsets of S𝑆S that contain only red balls. We compute 𝐄​[X]𝐄delimited-[]𝑋\mathbf{E}[X] in two different ways. On the one hand,

𝐄​[X]=βˆ‘i=jnPr⁑[S contains i red balls]​(ij)=βˆ‘i=jn(Nn)βˆ’1​(Pi)​(Nβˆ’Pnβˆ’i)​(ij).𝐄delimited-[]𝑋superscriptsubscript𝑖𝑗𝑛PrS contains i red ballsbinomial𝑖𝑗superscriptsubscript𝑖𝑗𝑛superscriptbinomial𝑁𝑛1binomial𝑃𝑖binomial𝑁𝑃𝑛𝑖binomial𝑖𝑗\mathbf{E}[X]=\sum_{i=j}^{n}\Pr[\text{S contains $i$ red balls}]\binom{i}{j}=\sum_{i=j}^{n}\binom{N}{n}^{-1}\binom{P}{i}\binom{N-P}{n-i}\binom{i}{j}.(7)

On the other hand, let IβŠ†{1,…,n}𝐼1…𝑛I\subseteq\{1,\dots,n\} with |I|=j𝐼𝑗|I|=j. Then the probability that all the balls in the positions indexed by I𝐼I are red is

PNβ‹…Pβˆ’1Nβˆ’1β‹…β‹―β‹…Pβˆ’j+1Nβˆ’j+1≀(PN)j=pj.⋅𝑃𝑁𝑃1𝑁1⋯𝑃𝑗1𝑁𝑗1superscript𝑃𝑁𝑗superscript𝑝𝑗\frac{P}{N}\cdot\frac{P-1}{N-1}\cdot\cdots\cdot\frac{P-j+1}{N-j+1}\leq\left(\frac{P}{N}\right)^{j}=p^{j}.

Thus, by linearity of expectation 𝐄​[X]≀(nj)​pj𝐄delimited-[]𝑋binomial𝑛𝑗superscript𝑝𝑗\mathbf{E}[X]\leq\binom{n}{j}p^{j}. Together with (7), the claim follows. ∎

Claim 5.4.

For every Ο„β‰₯1𝜏1\tau\geq 1, we have

(Nn)βˆ’1β€‹βˆ‘i=0n(Pi)​(Nβˆ’Pnβˆ’i)​τi≀(1+(Ο„βˆ’1)​p)n.superscriptbinomial𝑁𝑛1superscriptsubscript𝑖0𝑛binomial𝑃𝑖binomial𝑁𝑃𝑛𝑖superscriptπœπ‘–superscript1𝜏1𝑝𝑛\binom{N}{n}^{-1}\sum_{i=0}^{n}\binom{P}{i}\binom{N-P}{n-i}\tau^{i}\leq(1+(\tau-1)p)^{n}.
Proof.

Using Claim 5.3 and the Binomial theorem (twice),

(Nn)βˆ’1β€‹βˆ‘i=0n(Pi)​(Nβˆ’Pnβˆ’i)​τisuperscriptbinomial𝑁𝑛1superscriptsubscript𝑖0𝑛binomial𝑃𝑖binomial𝑁𝑃𝑛𝑖superscriptπœπ‘–\displaystyle\binom{N}{n}^{-1}\sum_{i=0}^{n}\binom{P}{i}\binom{N-P}{n-i}\tau^{i}=(Nn)βˆ’1β€‹βˆ‘i=0n(Pi)​(Nβˆ’Pnβˆ’i)​(1βˆ’(Ο„βˆ’1))iabsentsuperscriptbinomial𝑁𝑛1superscriptsubscript𝑖0𝑛binomial𝑃𝑖binomial𝑁𝑃𝑛𝑖superscript1𝜏1𝑖\displaystyle=\binom{N}{n}^{-1}\sum_{i=0}^{n}\binom{P}{i}\binom{N-P}{n-i}(1-(\tau-1))^{i}
=(Nn)βˆ’1β€‹βˆ‘i=0n(Pi)​(Nβˆ’Pnβˆ’i)β€‹βˆ‘j=0i(ij)​(Ο„βˆ’1)jabsentsuperscriptbinomial𝑁𝑛1superscriptsubscript𝑖0𝑛binomial𝑃𝑖binomial𝑁𝑃𝑛𝑖superscriptsubscript𝑗0𝑖binomial𝑖𝑗superscript𝜏1𝑗\displaystyle=\binom{N}{n}^{-1}\sum_{i=0}^{n}\binom{P}{i}\binom{N-P}{n-i}\sum_{j=0}^{i}\binom{i}{j}(\tau-1)^{j}
=(Nn)βˆ’1β€‹βˆ‘j=0n(Ο„βˆ’1)jβ€‹βˆ‘i=jn(Pi)​(Nβˆ’Pnβˆ’i)​(ij)absentsuperscriptbinomial𝑁𝑛1superscriptsubscript𝑗0𝑛superscript𝜏1𝑗superscriptsubscript𝑖𝑗𝑛binomial𝑃𝑖binomial𝑁𝑃𝑛𝑖binomial𝑖𝑗\displaystyle=\binom{N}{n}^{-1}\sum_{j=0}^{n}(\tau-1)^{j}\sum_{i=j}^{n}\binom{P}{i}\binom{N-P}{n-i}\binom{i}{j}
β‰€βˆ‘j=0n(nj)​((Ο„βˆ’1)​p)j=(1+(Ο„βˆ’1)​p)n,absentsuperscriptsubscript𝑗0𝑛binomial𝑛𝑗superscript𝜏1𝑝𝑗superscript1𝜏1𝑝𝑛\displaystyle\leq\sum_{j=0}^{n}\binom{n}{j}((\tau-1)p)^{j}=(1+(\tau-1)p)^{n},

as claimed. ∎

Thus, for any Ο„β‰₯1𝜏1\tau\geq 1 and kβ‰₯p​nπ‘˜π‘π‘›k\geq pn, we get as before

Pr⁑[H​(N,P,n)β‰₯k]=(Nn)βˆ’1β€‹βˆ‘i=kn(Pi)​(Nβˆ’Pnβˆ’i)≀(Nn)βˆ’1β€‹βˆ‘i=0n(Pi)​(Nβˆ’Pnβˆ’i)​τiβˆ’k≀(p​τ+1βˆ’p)nΟ„k,Prπ»π‘π‘ƒπ‘›π‘˜superscriptbinomial𝑁𝑛1superscriptsubscriptπ‘–π‘˜π‘›binomial𝑃𝑖binomial𝑁𝑃𝑛𝑖superscriptbinomial𝑁𝑛1superscriptsubscript𝑖0𝑛binomial𝑃𝑖binomial𝑁𝑃𝑛𝑖superscriptπœπ‘–π‘˜superscriptπ‘πœ1𝑝𝑛superscriptπœπ‘˜\Pr[H(N,P,n)\geq k]=\binom{N}{n}^{-1}\sum_{i=k}^{n}\binom{P}{i}\binom{N-P}{n-i}\\ \leq\binom{N}{n}^{-1}\sum_{i=0}^{n}\binom{P}{i}\binom{N-P}{n-i}\tau^{i-k}\leq\frac{(p\tau+1-p)^{n}}{\tau^{k}},

by Claim 5.4. From here the proof proceeds as in Section 3.2. ∎

5.3 Negative Correlations

The proof by Impagliazzo and Kabanets [15] from Section 3.3 can be used to relax the independence assumption. It now suffices that the random variables are negatively correlated.

Theorem 5.5.

Let X1,…,Xnsubscript𝑋1…subscript𝑋𝑛X_{1},\dots,X_{n} be random variables with Xi∈{0,1}subscript𝑋𝑖01X_{i}\in\{0,1\}. Suppose there exist pi∈[0,1]subscript𝑝𝑖01p_{i}\in[0,1], i=1,…,n𝑖1…𝑛i=1,\dots,n, such that for every index set IβŠ†{1,…,n}𝐼1…𝑛I\subseteq\{1,\dots,n\}, we have 𝐄​[∏i∈IXi]β‰€βˆi∈Ipi𝐄delimited-[]subscriptproduct𝑖𝐼subscript𝑋𝑖subscriptproduct𝑖𝐼subscript𝑝𝑖\mathbf{E}\big{[}\prod_{i\in I}X_{i}\big{]}\leq\prod_{i\in I}p_{i}. Set X:=βˆ‘i=1nXiassign𝑋superscriptsubscript𝑖1𝑛subscript𝑋𝑖X:=\sum_{i=1}^{n}X_{i} and p:=(1/n)β€‹βˆ‘i=1npiassign𝑝1𝑛superscriptsubscript𝑖1𝑛subscript𝑝𝑖p:=(1/n)\sum_{i=1}^{n}p_{i}. Then, for any t∈[0,1βˆ’p]𝑑01𝑝t\in[0,1-p], we have

Pr⁑[Xβ‰₯(p+t)​n]≀eβˆ’DKL​(p+tβˆ₯p)​n.Pr𝑋𝑝𝑑𝑛superscript𝑒subscript𝐷KL𝑝conditional𝑑𝑝𝑛\Pr[X\geq(p+t)n]\leq e^{-D_{\textup{KL}}(p+t\|p)n}.
Proof.

Let λ∈[0,1]πœ†01\lambda\in[0,1] be a parameter to be chosen later. Let IβŠ†{1,…,n}𝐼1…𝑛I\subseteq\{1,\dots,n\} be a random index set obtained by including each element i∈{1,…,n}𝑖1…𝑛i\in\{1,\dots,n\} with probability Ξ»πœ†\lambda. As before, we estimate the expectation 𝐄​[∏i∈IXi]𝐄delimited-[]subscriptproduct𝑖𝐼subscript𝑋𝑖\mathbf{E}\bigl{[}\prod_{i\in I}X_{i}\bigr{]} in two different ways, where the expectation is over the random choice of X1,…,Xnsubscript𝑋1…subscript𝑋𝑛X_{1},\dots,X_{n} and I𝐼I. Similarly to before,

𝐄​[∏i∈IXi]=βˆ‘SβŠ†{1,…,n}Pr⁑[I=S]⋅𝐄​[∏i∈SXi]β‰€βˆ‘SβŠ†{1,…,n}Ξ»|S|​(1βˆ’Ξ»)nβˆ’|S|β‹…(∏i∈Spi)=βˆ‘SβŠ†{1,…,n}(∏i∈Sλ​pi)​(∏i∈{1,…,n}βˆ–S(1βˆ’Ξ»))=∏i=1n(1βˆ’Ξ»+pi​λ)≀(1βˆ’Ξ»+p​λ)n,𝐄delimited-[]subscriptproduct𝑖𝐼subscript𝑋𝑖subscript𝑆1…𝑛⋅Pr𝐼𝑆𝐄delimited-[]subscriptproduct𝑖𝑆subscript𝑋𝑖subscript𝑆1…𝑛⋅superscriptπœ†π‘†superscript1πœ†π‘›π‘†subscriptproduct𝑖𝑆subscript𝑝𝑖subscript𝑆1…𝑛subscriptproductπ‘–π‘†πœ†subscript𝑝𝑖subscriptproduct𝑖1…𝑛𝑆1πœ†superscriptsubscriptproduct𝑖1𝑛1πœ†subscriptπ‘π‘–πœ†superscript1πœ†π‘πœ†π‘›\mathbf{E}\Bigl{[}\prod_{i\in I}X_{i}\Bigr{]}=\sum_{S\subseteq\{1,\dots,n\}}\Pr[I=S]\cdot\mathbf{E}\Bigl{[}\prod_{i\in S}X_{i}\Bigr{]}\leq\sum_{S\subseteq\{1,\dots,n\}}\lambda^{|S|}(1-\lambda)^{n-|S|}\cdot\Big{(}\prod_{i\in S}p_{i}\Big{)}\\ =\sum_{S\subseteq\{1,\dots,n\}}\Big{(}\prod_{i\in S}\lambda p_{i}\Big{)}\Big{(}\prod_{i\in\{1,\dots,n\}\setminus S}(1-\lambda)\Big{)}=\prod_{i=1}^{n}(1-\lambda+p_{i}\lambda)\leq(1-\lambda+p\lambda)^{n},(8)

by the arithmetic-geometric mean inequality. The proof of the lower bound remains unchanged and yields

𝐄​[∏i∈IXi]β‰₯(1βˆ’Ξ»)(1βˆ’pβˆ’t)​n​Pr⁑[Xβ‰₯(p+t)​n],𝐄delimited-[]subscriptproduct𝑖𝐼subscript𝑋𝑖superscript1πœ†1𝑝𝑑𝑛Pr𝑋𝑝𝑑𝑛\mathbf{E}\Bigl{[}\prod_{i\in I}X_{i}\Bigr{]}\geq(1-\lambda)^{(1-p-t)n}\Pr[X\geq(p+t)n],

as before. Combining with (8) and optimizing for Ξ»πœ†\lambda finishes the proof, see Section 3.3. ∎

Acknowledgments.

This survey is based on lecture notes for a class on advanced algorithms at Freie UniversitΓ€t Berlin. I would like to thank all the students who took this class for their interest and participation. I would also like to thank Nabil Mustafa and Jonathan Ullman for valuable comments that improved this survey.

References

  • [1] N. Alon and J. Spencer. The Probabilistic Method. Wiley-Interscience, 2016.
  • [2] S. Arora and B. Barak. Computational Complexity – A Modern Approach. Cambridge University Press, 2009.
  • [3] K. Azuma. Weighted sums of certain dependent random variables. TΓ΄hoku Math. J. (2), 19:357–367, 1967.
  • [4] S. N. Bernstein. Sobranie Sochinenii [Collected Works]. Nauka, Moscow, 1964.
  • [5] X. Chen. A likelihood ratio approach for probabilistic inequalities. arXiv:1308.4123, 2013.
  • [6] F. R. K. Chung and L. Lu. Concentration inequalities and martingale inequalities: A survey. Internet Mathematics, 3(1):79–127, 2006.
  • [7] V. ChvΓ‘tal. The tail of the hypergeometric distribution. Discrete Mathematics, 25(3):285–287, 1979.
  • [8] T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein. Introduction to Algorithms. MIT Press, 3rd edition, 2009.
  • [9] T. M. Cover and J. A. Thomas. Elements of information theory. Wiley-Interscience, 2en edition, 2006.
  • [10] D. P. Dubhashi and A. Panconesi. Concentration of Measure for the Analysis of Randomized Algorithms. Cambridge University Press, 2009.
  • [11] C. Dwork and A. Roth. The algorithmic foundations of differential privacy. Foundations and Trends in Theoretical Computer Science, 9(3-4):211–407, 2014.
  • [12] O. Goldreich. Computational complexity – a conceptual perspective. Cambridge University Press, 2008.
  • [13] T. Hagerup and C. RΓΌb. A guided tour of Chernoff bounds. Inform. Process. Lett., 33(6):305–308, 1990.
  • [14] W. Hoeffding. Probability inequalities for sums of bounded random variables. J. Amer. Statist. Assoc., 58:13–30, 1963.
  • [15] R. Impagliazzo and V. Kabanets. Constructive proofs of concentration bounds. In Proc. 13th Int. Conf. Approx. (APPROX) and 14th Int. Conf. Rand. Comb. Opt. (RANDOM), pages 617–631, 2010.
  • [16] J. M. Kleinberg and Γ‰. Tardos. Algorithm design. Addison-Wesley, 2006.
  • [17] C. McDiarmid. Concentration. In Probabilistic methods for algorithmic discrete mathematics, volume 16 of Algorithms Combin., pages 195–248. Springer-Verlag, 1998.
  • [18] F. McSherry and K. Talwar. Mechanism design via differential privacy. In Proc. 48th Annu. IEEE Symp. Found. Comput. Sci. (FOCS), pages 94–103, 2007.
  • [19] M. Mitzenmacher and E. Upfal. Probability and Computing: Randomization and Probabilistic Techniques in Algorithms and Data Analysis. Cambridge University Press, 2nd edition, 2017.
  • [20] P. Morin, W. Mulzer, and T. Reddad. Encoding arguments. ACM Comput. Surv., 50(3):46:1–46:36, 2017.
  • [21] R. Motwani and P. Raghavan. Randomized Algorithms. Cambridge University Press, 1995.
  • [22] T. Steinke and J. Ullman. Subgaussian tail bounds via stability arguments. arXiv:1701.03493, 2017.