Information Geometry – Summary of Lecture 1
In preparation for going through Amari and Nagaoka’s book on “Methods of Information Geometry”, it is valuable to consider the statistical world from the viewpoint of families of distributions. Often, questions that are of interest to us can be (re-)formulated in terms of families of distributions. In class, it was loosely explained that a Kalman filter, in wishing to track the location of an aeroplane, is actually keeping track of the evolution of a particular distribution over time.
From this viewpoint of families of distributions, one pertinent question is, given two distributions, how far apart are they? If we could answer this question by defining an interesting “statistical distance” between two distributions, we could start to build up a geometry. The challenge would be to arrange the distributions, considered as points in space, in a certain shape (on a torus or a sphere, perhaps?) so that the geometric distance between two distributions equalled their statistical distance. For example, the class of all non-degenerate Gaussian random variables can be described by the upper half plane in where we think of the horizontal axis as specifying the mean and the vertical axis as the variance . This is what we mean by a geometric representation, but it is deficient in that the geometric (Euclidean) distance between two distributions and is unrelated to any interesting statistical concept. Does there exist a transformation of the parameters (possibly sending them into a higher-dimensional space) so that the resulting shape of the Gaussian family has geometric significance (meaning that the distance between two points is related to a useful statistical concept)?
While the answer turns out to be “yes”, this is not the end of the story. There are important statistical concepts, Kullback-Leibler divergence being a prime example, which do not fit perfectly into a (Riemannian) geometric framework because they are not true distances. (The Kullback-Leibler divergence is not symmetrical; the Kullback-Leibler divergence (or “distance”) from one distribution to another is not necessarily equal to the Kullback-Leibler divergence in the other direction, from the latter distribution to the former.) Therefore, it is to be expected that new concepts in geometry will be developed for the express purpose of enabling a comprehensive geometric interpretation of statistics. (Indeed, Amari’s introduction of “dual connections” into geometry is such an example.) As is common in many areas of science, this leads to a symbiotic relationship where statistical thinking can lead to advances in geometry and geometric thinking can lead to advances in statistics.
In a nutshell, Information Geometry, at its most basic level, allows us to visualise geometrically:
- certain statistical concepts of significant interest to statisticians (Fisher information, Kullback-Leibler divergence, …);
- certain algorithms used by engineers (Expectation-Maximisation algorithm, turbo decoding, …).
These visualisations are useful because geometric concepts (straight lines, distances between points, projections) have statistical meaning, thereby allowing us to use geometry to reason about statistical problems. An additional perspective on a problem can do no harm and may even be beneficial.