Home > Informal Classroom Notes > Simple Derivation of Neyman-Pearson Lemma for Hypothesis Testing

Simple Derivation of Neyman-Pearson Lemma for Hypothesis Testing

This short note presents a very simple and intuitive derivation that explains why the likelihood ratio is used for hypothesis testing.

Recall the standard setup: observed is a realisation $x$ of a random variable (or vector or process) $X$ generated either with probability density $p_0(X)$ or $p_1(X)$.

The observation space is to be partitioned into two regions, one region labelled zero and the other unity. If $x$ lies in the region labelled zero then $X$ is assumed to have been generated under $p_0$. The rough aim is to choose the partition so that if this “game” is repeated many times — choose $i$ to be 0 or 1; repeatedly generate samples from $p_i$ and count how many times they fall in the two regions — then the ratio of “correct outcomes” to “incorrect outcomes” is maximised, regardless of whether $i=0$ or $i=1$ was used. On average, we wish to be correct many more times than we are wrong.

Since there is a trade-off between being correct when $i=0$ and when $i=1$, Neyman-Pearson’s underlying criterion is to fix the probability of being wrong when $i=0$ and maximise the probability of being correct when $i=1$. (This part is assumed to be familiar to the reader.)

Mathematically, the region $A$ of the observation space must be found such that 1) the probability $P_0(A)$ of the observation lying in $A$ under $p_0$ is some fixed (and small) value, while 2) the probability $P_1(A)$ of the observation lying in $A$ under $p_1$ is as large as possible. (This region $A$ will be the region labelled unity.) This is a constrained optimisation problem involving a region of space. Pleasantly then, there is a simple way of solving it.

The trick is to imaging the observation space as broken up into many small tiles fitting together to cover the observation space. (If the observation space is drawn as a square then imagine dividing it into hundreds of small squares.) With this quantisation, finding the region $A$ (up to quantisation error) means finding which of the tiles we want to be part of the region $A$.

Think of starting with $A$ being the empty set and growing it by adding tiles one at a time. The first tile we add will do two things: it will give us a $P_0(A)$ which we want to keep as small as possible and it will give us a $P_1(A)$ which we want to maximise. (At this point, some readers may wish to re-tile the region with tiles of possibly differing sizes, so that the probability under $P_0$ of landing on any given tile is the same fixed constant regardless of which tile was chosen.) It is intuitively clear (and even clearer after re-tiling) that the very first tile we should pick is the one which will make $\frac{P_1(A)}{P_0(A)}$ as large as possible. If $P_0(A)$ has not exceeded the limit then we are allowed to pick another tile, and we pick the next best tile, that is, out of the tiles $T_k$ remaining, we want the tile that maximises the ratio $\frac{p_1(T_k)}{p_0(T_k)}$. Each time, we greedily pick the best remaining tile that will help us make $P_1(A)$ as large as possible while not letting $P_0(A)$ grow too quickly.

As the size of the tiles shrinks to zero, the “limit” is the likelihood ratio $\frac{p_1(x)}{p_0(x)}$ for the “infinitesimal” tile $T_x$ centred at $x$. Choosing the region $A = \{x \mid \frac{p_1(x)}{p_0(x)} > c\}$ for some constant $c$ corresponds precisely to choosing the tiles $T_x$ with the largest ratios; every tile with ratio better than $c$ is included.

The above is a hand-waving argument and is not the best on which to base a rigorous proof. Often this is the case though: an intuitive derivation is subsequently verified by a less intuitive but more elementary mathematical argument. An elementary proof can be found in the Wikipedia.