Archive

Posts Tagged ‘maximum likelihood’

Measure-theoretic Formulation of the Likelihood Function

June 26, 2013 Leave a comment

Let P_\theta be a family of probability measures indexed by \theta \in \Theta. For notational convenience, assume 0 \in \Theta, so that P_0 is one of the probability measures in the family. This short note sketches why L(\theta) = E_0\left[ \frac{dP_\theta}{dP_0} \mid \mathcal X \right] is the likelihood function, where the \sigma-algebra \mathcal X describes the possible observations and E_0 denotes expectation with respect to the measure P_0.

First, consider the special case where the probability measure can be described by a probability density function (pdf) p(x,y;\theta). Here, x is a real-valued random variable that we have observed, y is a real-valued unobserved random variable, and \theta indexes the family of joint pdfs. The likelihood function when there is a “hidden variable” y is usually defined as \theta \mapsto p(x;\theta) where p(x;\theta) is the marginalised pdf obtained by integrating out the unknown variable y, that is, p(x;\theta) = \int_{-\infty}^{\infty} p(x,y;\theta)\,dy. Does this likelihood function equal L(\theta) when \mathcal X is the \sigma-algebra generated by the random variable x?

The correspondence between the measure and the pdf is: P_\theta(A) = \int_A p(x,y;\theta)\,dx\,dy for any (measurable) set A \subset \mathbb{R}^2; this is the probability that (x,y) lies in A. In this case, the Radon-Nikodym derivative \frac{dP_\theta}{dP_0} is simply the ratio \frac{p(x,y;\theta)}{p(x,y;0)}. The conditional expectation with respect to X under the distribution p(x,y;0) is E_0\left[ \frac{p(x,y;\theta)}{p(x,y;0)} \mid x \right] = \int_{-\infty}^{\infty} \frac{p(x,y;\theta)}{p(x,y;0)} p(x,y;0)\, dy = \int_{-\infty}^{\infty} p(x,y;\theta)\,dy, verifying in this special case that L(\theta) is indeed the likelihood function.

The above verification does not make L(\theta) = E_0\left[ \frac{dP_\theta}{dP_0} \mid \mathcal X \right] any less mysterious. Instead, it can be understood directly as follows. From the definition of conditional expectation, it is straightforward to verify that L(\theta) = \left. \frac{dP_\theta}{dP_0}\right|_{\mathcal X} meaning that for any \mathcal X-measurable set A, P_\theta(A) = \int_A \left. \frac{dP_\theta}{dP_0}\right|_{\mathcal X}\,dP_0. The likelihood function is basically asking for the “probability” that we observed what we did, or precisely, we want to take the set A to be our actual observation and see how P_\theta(A) varies with \theta. This would work if P_\theta(A) > 0 but otherwise it is necessary to look at how P_\theta(A) varies when A is an arbitrarily small but non-negligible set centred on the true observation. (If you like, it is impossible to make a perfect observation correct to infinitely many significant figures; instead, an observation of x usually means we know, for example, that 1.0 \leq x \leq 1.1, hence A can be chosen to be the event that 1.0 \leq x \leq 1.1 instead of the negligible event x = 1.05.) It follows from the integral representation P_\theta(A) = \int_A \left. \frac{dP_\theta}{dP_0}\right|_{\mathcal X}\,dP_0 that \left. \frac{dP_\theta}{dP_0}\right|_{\mathcal X} describes the behaviour of P_\theta(A) as A shrinks down from a range of outcomes to a single outcome. Importantly, the subscript \mathcal X means L(\theta) = \left. \frac{dP_\theta}{dP_0}\right|_{\mathcal X} is \mathcal X-measurable, therefore, L(\theta) depends only on what is observed and not on any other hidden variables.

While the above is not a careful exposition, it will hopefully point the interested reader in a sensible direction.

Advertisements