Measure-theoretic Formulation of the Likelihood Function
Let be a family of probability measures indexed by . For notational convenience, assume , so that is one of the probability measures in the family. This short note sketches why is the likelihood function, where the -algebra describes the possible observations and denotes expectation with respect to the measure .
First, consider the special case where the probability measure can be described by a probability density function (pdf) . Here, is a real-valued random variable that we have observed, is a real-valued unobserved random variable, and indexes the family of joint pdfs. The likelihood function when there is a “hidden variable” is usually defined as where is the marginalised pdf obtained by integrating out the unknown variable , that is, . Does this likelihood function equal when is the -algebra generated by the random variable ?
The correspondence between the measure and the pdf is: for any (measurable) set ; this is the probability that lies in . In this case, the Radon-Nikodym derivative is simply the ratio . The conditional expectation with respect to under the distribution is , verifying in this special case that is indeed the likelihood function.
The above verification does not make any less mysterious. Instead, it can be understood directly as follows. From the definition of conditional expectation, it is straightforward to verify that meaning that for any -measurable set , . The likelihood function is basically asking for the “probability” that we observed what we did, or precisely, we want to take the set to be our actual observation and see how varies with . This would work if but otherwise it is necessary to look at how varies when is an arbitrarily small but non-negligible set centred on the true observation. (If you like, it is impossible to make a perfect observation correct to infinitely many significant figures; instead, an observation of usually means we know, for example, that , hence can be chosen to be the event that instead of the negligible event .) It follows from the integral representation that describes the behaviour of as shrinks down from a range of outcomes to a single outcome. Importantly, the subscript means is -measurable, therefore, depends only on what is observed and not on any other hidden variables.
While the above is not a careful exposition, it will hopefully point the interested reader in a sensible direction.