Let be a subset of the plane defined by the vanishing of a smooth (that is, derivatives of all orders exist) function . For simplicity, it is assumed throughout that is not identically zero.
For example, if then the set is the unit circle. This can be readily visualised using the online Sage Notebook. Sign in to Sage, create a new worksheet, and type: var(‘x,y’); implicit_plot(x^2+y^2-1==0, (x, -2, 2), (y, -2, 2)); NOTE: WordPress automatically converts the first apostrophe in var(‘x,y’); from the correct to the incorrect `.
Next, try: var(‘x,y’); implicit_plot(x^3+y^3-x*y==0, (x, -2, 2), (y, -2, 2));
Note that this second example has a self-intersection; if one considers starting at one end of the line and walking to the other end, there will be a point that will be passed over twice, once when entering the loop and once upon exit.
Thirdly, try: var(‘x,y’); implicit_plot(x^3-y^2==0, (x, -2, 2), (y, -2, 2));
This third example has a cusp; if one endeavours to drive a car from the bottom end of the line to the top end, one will get stuck at the cusp with the car pointing to the left but the “road” continuing off to the right.
Roughly speaking, and as a good introduction to differential geometry, a (concrete) manifold is a subset of that does not contain cusps (that is, it is smooth) and does not contain self-intersections (that is, it is locally Euclidean). Regardless, it is interesting to ask how difficult it is to determine if the set has any cusps or self-intersections. The intuitive principle of driving a car along the curve to detect cusps or self-intersections may appear difficult to translate into a mathematical tool, especially for detecting self-intersections.
Note though that driving a car corresponds to a parametrisation of the curve: roughly speaking, constructing functions and such that for all . Finding a self-intersection is equivalent to finding a non-trivial () solution to and . This is a difficult “global” problem. However, the curve has been specified implicitly by , and perhaps the problem is not global but only local with respect to this implicit formulation?
Detecting cusps and self-intersections really is only a local problem with respect to , meaning that knowledge of what looks like anywhere except in a small neighbourhood of a point is irrelevant to determining whether there is a singularity or self-intersection at . This suggests considering a Taylor series of about the point of interest. [Technically, we would also want to be analytic if we are relying on a Taylor series argument, but we are focusing here on gaining a feel for the problem in the first instance.] It is also worth noting that this is also an example of “unhelpful intuition”; thinking of driving cars is a hindrance, not a help.
Choose a point lying on , that is, . Then where the dots denote higher-order terms. If either or is non-zero then contains a linear term, and the linear term dominates the behaviour of when is sufficiently close to . The linear equation describes a straight line; there can be no cusps or self-intersections. This argument can be made rigorous using the implicit function theorem, even when is not analytic: if for every point , either or is non-zero at , then contains no self-intersections or cusps. This result, stated more generally in terms of the rank of the Jacobian matrix, appears near the beginning in most differential geometry textbooks. Without the above considerations though, it may appear mysterious how the Jacobian can detect cusps and self-intersections. (The key point is not to think in terms of a parametrisation of the curve, but instead, break the space into neighbourhoods which are sufficiently small that it is possible to determine exactly what looks like. To belabour the point, if a point in is not in then it is not in .)
Textbooks generally do not mention explicitly though that if fails this “Jacobian test” then it may still be a manifold. The above makes this clear; if then it is necessary to examine higher-order terms before making any claims. As a simple example, fails the test at because both and are zero at the origin, yet is the straight line given equivalently by .
The next few lectures aim to provide an introduction to several basic concepts in differential geometry required for progressing our understanding of information geometry. Rather than commence with a definition of differential geometry, the idea of “coordinate independence” will be studied in the simpler setting of affine geometry first. Roughly speaking, differential geometry combines the notion of coordinate independence with the notion of gluing simpler things together to form more complicated objects.
Let be a set. It may represent all the points on an infinitely large sheet of paper, in which case one must resist the temptation to think of as a subset of but rather, envisage as the whole universe; there is nothing other than . Alternatively, might represent the set of all elephants in the world.
Consider first the case when is the sheet of paper. In fact, assume we all live on ; the world is flat. In order to write down where someone lives, we need a coordinate chart. We need an injective function which assigns to every point a unique pair of numbers which we call the coordinates of the point. In order for this to be successful, everyone needs to use the same coordinate chart . But given just , no two people are likely to choose the same chart. How could they? Just for starters, it would necessitate someone drawing a big cross on the ground and declaring that everyone must consider that point to have coordinates . Extra information beyond the set itself is required if different people are able to construct the same coordinate chart.
Sometimes, as we will now see, there is extra information available but it is not enough to determine a unique coordinate chart. If every person had a magnetic compass and a ruler then they could agree that moving one metre east must correspond to increasing the first coordinate by one, and moving one metre to the north must correspond to increasing the second coordinate by one. Two people’s charts will still differ in general, but only in the choice of origin. Although people would not be able to communicate where they live in absolute terms – saying I live at is no good to anyone else with a different coordinate chart – there is still a wealth of information that can be communicated. Saying that the difference in coordinates between my house and your house is is enough for you to find your way to my house; although it is likely our coordinate charts differ, the same answer is obtained no matter which chart is used. This is called coordinate independence.
The more possibilities there are for the charts, the fewer the number of coordinate independent properties. For example, if now people’s rulers are confiscated and they only have magnetic compasses, people’s coordinate charts can differ from each other’s in more ways. Saying I live away from your home will no longer work; my 5 units east will almost surely differ from your 5 units east. We could however, still agree on whether a collection of trees lies in a straight line or not.
Precisely, every person may decide to define that a collection of trees lies in a straight line if, under their personal coordinate chart , the images of the trees lie in a straight line. This definition works because even though two people’s coordinate charts may be different, their definitions of lying in a straight line turn out to be the same. We will see presently that this can be understood in terms of a simple concept called transition functions. Note too that earlier we were implicitly thinking in terms of definitions too; we defined the location of your house relative to my house to be the vector and it was a useful definition whenever it was coordinate independent, as it was when we had both a compass and a ruler but not when we had a compass alone.
If represents a set of elephants then I might choose a coordinate chart by defining to be the length of the trunk of elephant . Tom might define his chart by measuring the length of the tail. We would not agree on the absolute size of an elephant but if the length of an elephant’s trunk and its tail is always a fixed ratio then we would agree on what it means for one elephant to be twice as big as another elephant.
Let’s formalise the above mathematically. The set can be given a lot of extra structure. We are used to thinking of it as a vector space – we know how to add two points together in a sensible and consistent way – and we commonly introduce a norm for measuring distance and sometimes even an inner product for measuring angles. If is a bijection then any structure we have on can be transferred to . We can make a vector space simply by defining and , for instance.
Let be a set of bijective functions of the form . Each element of represents a valid coordinate chart, or the way we had introduced it earlier, each person uses their own coordinate chart and is the set of all these coordinate charts. Unless contains only a single coordinate chart, we can no longer transfer arbitrary structures from to in a coordinate independent way; we saw examples of this earlier. What structures can be transferred?
A bit of thought reveals that the key is to study the transition functions for all pairs . Observe that is a function from to . We can therefore use the structure on to determine what properties has, for example, it might be that is such that is always a linear function; linearity is a property which can be defined in terms of the vector space structure on .
Recalling the earlier examples, when people had magnetic compasses and rulers, the transition functions would always have the form for some vector . (Changing charts would cause to change; indeed, represents the difference in the choice of origin of the two charts.) When people only had magnetic compasses, would be of the more general form for some positive diagonal matrix . (Here, I have assumed that each person would build their own ruler, so everyone has rulers, they are just of different lengths.)
Linking in with previous lectures, the set can be made into an affine space by introducing a collection of coordinate charts such that for any two charts , their transition function always has the form for some matrix and vector . Because the image of a straight line under such a transition function remains a straight line, different people with different coordinate charts will still agree on what is and what is not a straight line in . It is a worthwhile exercise to prove that this definition of an affine space is equivalent to the definition given in earlier lectures.
To summarise, there is interest in playing the following mathematical game:
- We are given a set and a collection of coordinate charts .
- We want to give the set some structure coming from the structure on .
- We want to do this in a coordinate independent way, meaning that if I use my own coordinate chart and you use your own coordinate chart then we get the same structure on .
The secret is to look at the form of the transition functions for all pairs . The more general the form of the transition functions, the less structure can be transferred from to in a coordinate independent way.
The relevance to information geometry is that the parametrisation used to describe a family of densities is, to a large extent, irrelevant. Properties that depend on a particular choice of parametrisation are generally not as attractive as properties which are coordinate independent. If represents the family of Gaussian random variables parametrised by mean and variance then there is little justification in calling the subfamily a line segment because there does not appear to be anything special about the parametrisation of Gaussians by mean and variance. (It turns out that for exponential families, statisticians have come up with a set of parametrisations they believe to be nice. Although these parametrisations are not unique, their transition functions are always affine functions; this is why it was possible to introduce an affine structure in earlier lectures. Note that we have not pursued this to the end because we want to move quickly to a more powerful concept coming from differential geometry which will subsume this affine geometry.)
For completeness, note that a parametrisation is just the inverse of a coordinate chart. If we think of defining a family by specifying a function from to then we speak of a parametrisation. On the other hand, if someone points to a density then we can determine its coordinates by asking what value of makes .