Comments on JamesStein Estimation Theory
Part of the reason for this short article is to provide an example which will be relied on in subsequent articles arguing that:
 There are a priori no such things as estimation problems, only decision problems.
 The BayesianFrequentist debate is a nonsense (because it is illfounded).
The JamesStein Estimator has intrinsic interest though, and indeed, has been heralded by some as the most striking result in postwar mathematical statistics.
Reference
References to the literate can be found in the bibliography of the following paper:
Manton, J.H., Krishnamurthy, V. and Poor, H.V. (1998). JamesStein State Filtering Algorithms. IEEE Transactions on Signal Processing, 46(9) pp. 24312447.
Introduction
Write to denote that the realvalued random variable has a Gaussian distribution with unknown mean and unit variance. It is accepted that the “best” estimate of given is simply . (Without loss of generality it can be assumed we have only a single observation; multiple observations can be averaged, which will reduce the variance, but not change the essence of the discussion to follow.)
Assume now that (and therefore ) is a threedimensional realvalued vector and where is the identity matrix. In words, each of the three elements of is a Gaussian random variable with unknown mean and unit variance. Importantly, the three elements of are independent of each other.
Prior to the JamesStein estimator, every selfrespecting statistician would have argued that estimating the means of three independent random variables is equivalent to estimating the mean of each one in isolation, and in particular, it must follow that must remain optimal in this threedimensional case.
This is not necessarily true though. If we are interested in minimising the meansquare error of our estimate then while is optimal in the one and twodimensional cases, the following estimator is always better in the threedimensional case: . (An obvious improvement, but harder to analyse, is to set the term in brackets to zero whenever it would otherwise be negative.)
This is an example of a shrinkage estimator. All it does is take the normal estimate of the mean , and shrink it towards the origin by multiplying it by the scalar .
The resulting MeanSquare Error (MSE) of the JamesStein estimator has been graphed in the figure on the JamesStein wiki page. Regardless of the true value of , the MSE of the JamesStein estimator is always lower than the MSE of the usual estimator .
It is worthwhile emphasising how the performance of the estimators is being assessed. A graph is drawn with along the horizontal axis, representing the true value of what it is we wish to estimate. (Conceptually, should appear along the horizontal axis but this is a little tricky since is threedimensional. Fortunately, it turns out that the graph only depends on the norm of .) For a fixed value of , imagine that a computer generates very many realisations of , and for each realisation, is calculated and the error recorded. The MeanSquare Error (MSE) of the estimate is the average of these errors as the number of realisations goes to infinity. The MSE is graphed against . (It can be shown that the MSE depends only on the magnitude of and not on its direction.) The claim that the JamesStein Estimator is superior than the usual estimator means that, regardless of the value of , the resulting MSE is smaller for the JamesStein Estimator.
A Paradox?
Popular articles have appeared hailing the JamesStein estimator a paradox; one should use the price of tea in China to obtain a better estimate of the chance of rain in Melbourne!
It is not a paradox for the simple reason that even though the three random variables (that is, the three elements of ) are independent, the measure of the performance of the estimator is not. Definitely, the JamesStein estimator will not improve the estimate of all three means at once; that would be impossible. What the JamesStein Estimator does is gamble; it gambles that by guessing that all three means are closer to the origin than the observations suggest, the possibly enlarged error it makes on estimating one or two of the means is more than compensated for by the reduction in error that it achieves on the other one or two means.
It must be recognised that the JamesStein estimator is good for only some applications; generally, the normal estimate is preferable. (There are several explanations for this; one is that the JamesStein estimator trades bias for risk and it is this bias which is often undesirable in applications. A simpler explanation is that if three random variables are independent of each other, then quite likely, what is actually required in practice is an estimate of their means which is accurate for each and every one of the three random variables.)
The JamesStein estimator is good when it is truly the case that it is the overall MeanSquare Error (and not the individual MeanSquare Errors) that should be minimised. For example, if for represents the financial cost of claims a multinational insurance company will incur in the next year in three different countries, the company may be less concerned with estimating the values of the individual accurately and more concerned with getting an accurate overall estimate. Therefore, it may well choose to use the JamesStein Estimator.
Why shrink the estimate to the origin (or to some other point, which will also work)? One way to derive the JamesStein estimator is as an empirical Bayes estimate. If were a Gaussian random variable with zero mean and variance then the optimal estimate would indeed shrink the observation towards the origin by a factor depending on . Replacing by (a suitable function of) results in the JamesStein estimator; the sample variance of the observations serves as a proxy for .
Moral
The moral of the story is that there is no such thing as an optimal estimator; which estimator is “good” depends on the application. For the aforementioned multinational insurance company, the JamesStein Estimator is preferable. For most other applications, is best.
Subsequent articles will elaborate on the key message that there are a priori no such things as estimation problems, only decision problems.

June 8, 2010 at 11:00 pmThe Role of Estimates, Estimation Theory and Statistical Inference – Is it what we think it is? « Jonathan Manton's Blog