## Information Geometry – Affine Geometry (Lecture 3)

The previous lecture looked at some of the lower-level details of affine spaces. This is a prerequisite for understanding the following higher-level overview, which in turn is a prerequisite for going back and understanding the lower-level details in a finer and precise manner. Following the review is a section on how calculus can be used to test if a family is affine.

## Brief Review of Affine Geometry

It is important to distinguish between an affine space, an affine subspace of a vector space and an affine subspace of an affine space. They are three different but related concepts. If any part of the following, however small, is not understood, the reader is invited to revisit the previous lecture.

An *affine subspace of a vector space* can be defined directly in terms of vector space concepts. A subset of a vector space is an affine space if there exists a such that is a linear subspace of .

A set can be made into an *affine space* if the difference between two points can be made to behave like a vector, meaning among other things that . Therefore, there is an auxillary vector space and the difference of any two points in is a point in .

Choosing an origin then makes the affine space into a vector space. In particular, if is the auxillary vector space and then defined by is a bijective function which makes into a vector space. I will refer to such a function as a *coordinate chart*.

Using coordinate charts is a convenient way to work with affine spaces; it should be remembered though that the choice of origin is arbitrary and hence only concepts which do not depend on the choice of origin should be considered.

A subset of an affine space is an *affine subspace of an affine space* if its image under a coordinate chart is an affine subspace of the vector space . Note that if and the coordinate chart is used then is an affine subspace if and only if is a *linear* subspace of .

For the space of all (unnormalised) probability densities , we have observed that defining to be the log-likelihood ratio makes into an affine space in such a way that a family is exponential if and only if it is affine.

Just as a linear subspace of a vector space is itself a vector space in its own right, an affine subspace of an affine space is itself an affine space. It is therefore convenient at times to refer to affine subspaces as affine spaces.

## A Local Test for a Family to be Exponential

##### Open Subsets of Affine Spaces

As hinted at in the previous lecture, an exponential family might form only an (open and) convex subset of an affine subspace. To develop a straightforward test to see if a family is exponential, it is convenient to test first to see if the subset A is an open subset of an affine subspace. Openness is a strong enough property to ensure that the subset has the “same dimension” as the affine subspace yet is weak enough to lend itself to the development of a local test involving only derivatives. (Open sets will be explained later in the course; for the moment, little will be lost by merely thinking of an open set as a set having the property that given any point in the set, we can move a little bit in any direction around that point and still stay within the set.)

It is noted tangentially that a prevalent concept in mathematics is the recognition of scenarios where “local implies global”. Anyone who has tried to prove that a (smooth) function is convex directly from the global condition that the straight line segment connecting any two points on the graph of the function must never drop below the function will appreciate the equivalent local condition that the second order derivative of the function must be non-negative. The first condition is said to be global because it involves testing pairs of distinct points. The second condition is said to be local because to test the condition at a point merely involves examining the function in an arbitrarily small neighbourhood of and computing a limit.

##### Derivation

A local test for whether or not a subset of an affine space is an open subset of an affine subset B is now derived. (Precisely, we are given and are asked whether or not there exists an affine subspace such that is contained in and is open relative to .) For the derivation, it suffices to study the special case of when the affine space is . Therefore, assume that is a collection of points in indexed by the parameter . For example:

- for some matrix and vector ;
- where ;
- where .

In the first example, the space is affine whereas in the third example, provided , the space is not affine. The second example is perhaps the most interesting of the three.

Choose an arbitrary straight line in . As moves along this line, a curve is traced out in . In example one above, a straight line is traced out in ; this is the simplest possible case. In example two, a Lissajous curve is traced out. In example three, if then a parabola is traced out.

The first observation to make is that if were an open subset of an affine subspace of then the velocity vector of any curve traced out in by varying would belong to . Furthermore, under the regularity condition which we will be assuming for convenience, namely that the Jacobian matrix always has rank , then for any point and any vector (thought of as a vector whose base is at ), it would be possible to construct a curve such that and . In words, any vector in can be thought of as a velocity vector of some curve .

In principle, we could choose any two points , evaluate all possible velocity vectors of curves passing through and compare them with all possible velocity vectors of curves passing through . If we always obtained the same set of velocity vectors (by first shifting the base points to the origin to enable a comparison, as is always implicitly done in linear algebra) then we would have shown that is an open (due to the above regularity condition which was snuck in) subset of an affine subspace . However, this is still a global test because we must compare distinct points and .

Since equivalence is transitive, it suffices to test only nearby points with each other. In the limit, this motivates considering the acceleration vector . If were an open subset of an affine subspace then the acceleration vector would always lie in . Furthermore, a candidate is afforded by the span of the velocity vectors at any given point.

This leads to the following test for determining if a family is affine. Let be a non-empty open subset of and a twice continuously differentiable function whose derivative is injective for all . (In other words, the Jacobian matrix of has rank at all points .) Then is an open subset of a -dimensional affine subspace of if and only if, for all and , there exists a such that . The following examples hopefully serve to clarify the notation. (In words, the test is to see if the second-order partial derivatives of with respect to can be written as linear combinations of the first-order partial derivatives.)

If then and . The regularity condition that is injective is satisfied if has rank . (If has lower rank then the affine subspace would have dimension strictly lower than .) The test is satisfied by the trivial choice . The conclusion is that the family is affine.

If then and . The equation has no solution given any and non-zero , therefore the family is not affine. (This shows that the general case in the third example above is also not affine because if it were, then its projection onto its first two coordinates would also be affine.)

If then and . Note that there are values of for which is not injective; the equation has non-zero solutions in whenever is such that either or are zero. We assume that does not contain any such values. (Otherwise, the family would include boundary points and hence not be open.) Observe then that for any and any the equation has the solution , . Therefore, the family is affine.

##### Gaussian Family (Example)

The above test will be applied to the Gaussian family where . To convert the problem into a vector space setting, first introduce the coordinate chart . (Keep in mind that is a vector in the vector space of all continuous functions on and is therefore written as a function in .) We want to test if the second-order derivatives can be written as linear combinations of the first-order derivatives. Therefore, we calculate:

;

;

;

;

.

We must take into account though the “trick” of working with unnormalised densities. In full, it would mean augmenting with a third parameter and working with . This doesn’t change the above calculations, it merely augments the first-order derivatives with . (Note that represents the constant function whose value always equals .) In other words, when we are endeavouring to write the second-order derivatives in terms of the first-order derivatives, we can augment the

first-order derivatives with the constant function . Basic but tedious linear algebra shows:

;

;

.

Here, keep in mind that is fixed and therefore the linear coefficients can depend on (but not on ). Since is injective for , the above calculation shows that the Gaussian family is indeed an exponential family.

Firstly, thanks for posting these tutorials, they’re very useful! You’ve switched notation slightly between the last lecture and this one, shouldn’t the log-likelihood ratio be log (p/q) rather than log(p)/log(q), or am I missing something? From your Gaussian example at the end it looks like you’ve taken the distance as log(p/q). I think that you should also have a “-x^2” at the end of your log-likelihood ratio for the Gaussian example, rather than a “+x^2”, though it doesn’t actually matter since this term disappears in all of the derivatives.

I look forward to reading on!