Archive

Archive for October, 2015

The Cameron-Martin Space, and Some Differences Between Finite and Infinite Dimensional Spaces

October 4, 2015 Leave a comment

The usefulness of infinite-dimensional vector spaces is that often concepts in finite dimensions extend to infinite dimensions; this allows us to reason about infinite-dimensional problems using finite-dimensional intuition. (This is explained in detail in my primer on RKHS theory, for example.) Nevertheless, infinite-dimensional spaces exhibit behaviours not found in finite dimensions, and working effectively with infinite-dimensional spaces requires understanding how they differ from finite dimensional spaces.

This article discusses several key differences between a Gaussian measure on Euclidean space \mathbb{R}^n and on the separable Hilbert space l^2, which can be thought of as an infinite-dimensional generalisation of Euclidean space. (Recall l^2 consists of all square-summable sequences.)

A Gaussian Measure on l^2

Let e_i denote the canonical basis vectors of l^2, that is, e_i is the sequence of all zeros except for the ith term which is unity.

Essentially, we wish to construct a Gaussian random variable X on l^2 element-wise by first generating a collection of real-valued independent random variables X_i \sim N(0,\lambda_i) and then setting X = (X_1,X_2,\cdots), or in other words, X = \sum X_i e_i. We assume \lambda_i > 0.

It is essential for \lambda_i to decay to zero sufficiently fast, for otherwise, the sequence X will have a non-zero probability of not lying in l^2.  We require \sum X_i^2 < \infty to hold with probability one.  Since the standard deviation \sqrt{\lambda_i} gives the “typical” size of a random realisation of N(0,\lambda_i), it is not surprising that the requirement be \sum \lambda_i < \infty.  (Throughout, curious readers are invited to consult an introductory book on infinite-dimensional analysis for more details. For example, in general, one starts with a linear operator Q that will serve as the covariance matrix, and requires Q to be of trace class.  Here, I am simply taking Q to be the “diagonal matrix” \mathrm{diag}(\lambda_1,\lambda_2,\cdots).)

It turns out that the above procedure can be made rigorous; provided \sum \lambda_i < \infty, there is indeed a Gaussian random variable X on l^2 such that the X_i = \langle X, e_i \rangle are independent Gaussian random variables N(0,\lambda_i).

To any (Borel-measurable) subset A \subset l^2, we define the Gaussian measure \mu(A) as the probability that X lies in A.  (To a mathematician, this is putting the cart before the horse, but nevertheless, it is a convenient way to think when it comes to having to evaluate \mu(A) in certain situations.)

Since we insisted all the \lambda_i be positive, the measure \mu is non-zero on any open subset of l^2. Note too that \mu(l^2) = 1.

Some Possibly Surprising Facts

Given that the following do not hold in finite dimensions, they are initially counter-intuitive. The remainder of the article aims to give sufficient explanations to “improve” our intuition, thereby making the following facts unsurprising.

  • Given a Gaussian measure \mu on l^2, there exists a subset A \subset l^2 and an element v \in l^2 such that
    1. A and A+v are disjoint;
    2. \mu(A) = 1; and
    3. \mu(A+v) = 0.

Why should these facts be surprising? By analogy with the finite-dimensional case, one would expect that unless A is pretty much all of l^2 then it would not be possible for \mu(A) = 1. Indeed, \mu is non-zero on any open subset of l^2, hence must A be dense in l^2? So how can the translated version A+v be disjoint from A? And how can the “large” set A, having measure 1, go to the “small” set A+v having measure 0, simply by a translation?

Explanation

For concreteness, choose \lambda_i = \frac1{i^3}.

The subset A will be taken to be the limit of a monotonically increasing sequence of subsets A^{(n)}. The subsets A^{(n)} will be taken to be “rectangles” of the form A^{(n)} = \{ (x_1,x_2,\cdots) \in l^2 \mid -a_i^{(n)} < x_i < a_i^{(n)}\} where the a_i will be strictly positive.

Since the variance of the X_i goes to zero, we hope to be able to choose the a_i^{(n)} so that they decay to zero in i, for fixed n, while ensuring A^{(n)} has measure close to unity. This gives us a chance of constructing an A which is not the whole of l^2 yet has measure equal to unity. The rate of decay i^{-3/2} is too fast because the probability that X_i \sim N(0,i^{-3}) lies between -i^{-3/2} and i^{-3/2} does not depend on i; if this probability is c then c^\infty = 0 would be the measure of the rectangle. This motivates trying a slower rate of decay: a_i^{(n)} = \sqrt{2}\frac{n}{i}. (The numerator n is there to grow the rectangles so the probability of X lying in A^{(n)} goes to unity as n \rightarrow \infty, and the \sqrt{2} is for later convenience.)

The probability that X_i \sim N(0,i^{-3}) lies between \pm \sqrt{2}\frac{n}{i} is conveniently expressed in terms of the error function as \mathrm{erf}(n\sqrt{i}). Hopefully, \mu(A^{(n)}) = \prod_{i=1}^\infty \mathrm{erf}(n\sqrt{i}) > 0 for all n > 0, and \prod_{i=1}^\infty \mathrm{erf}(n\sqrt{i}) \rightarrow 1 as n \rightarrow \infty, so that \mu(A)=1. This is indeed the case.

[Here is the tedious calculation to prove the claims. Let c_i = 1-\mathrm{erf}(n \sqrt{i}). A well-known bound on the complementary error function is 1-\mathrm{erf}(u) = \mathrm{erfc}(u) < \frac1{\sqrt{\pi}u} e^{-u^2}. Therefore, c_i < d_i where d_i = \frac1{n\sqrt{\pi i}} e^{-in^2}. Note 0 < c_i < 1 and 0 < d_i < 1 when n \geq 1 and i \geq 1. Provided \sum \ln(1-c_i) is finite, \mu(A^{(n)}) = \prod (1-c_i) will be strictly positive, as required. Now, 0 > \ln(1-c_i) > \ln(1-d_i) \geq \frac{-d_i}{1-d_i}, hence it suffices to prove \sum\frac{d_i}{1-d_i} is finite. The ratio test for absolute convergence involves the ratio \frac{d_{i+1}}{1-d_{i+1}} \frac{1-d_i}{d_i} = \frac{d_{i+1}}{d_i} (1-d_i) \frac1{1-d_{i+1}}. Since d_i \rightarrow 0, it suffices to show \lim_{i \rightarrow \infty} \frac{d_{i+1}}{d_i} < 1 in order to conclude that \sum\frac{d_i}{1-d_i} is finite. Now, \frac{d_{i+1}}{d_i} = \sqrt{\frac{i}{i+1}} \frac{e^{-(i+1)n^2}}{e^{-in^2}} \rightarrow e^{-n^2} < 1 whenever n > 0, as required. To show \mu(A) = 1, we need to show \lim_{n \rightarrow \infty} \sum_i \frac{d_i}{1-d_i} = 0.  An earlier calculation shows d_{i+1} < e^{-n^2} d_i, so that \sum_i \frac{d_i}{1-d_i} \leq \frac{d_1}{1-d_1}(1+\alpha+\alpha^2+\cdots) = \frac{d_1}{1-d_1} \frac1{1-\alpha} where \alpha = e^{-n^2}.  It is clear that this can be made arbitrarily close to zero by choosing n large.]

  • Had we been in finite dimensions, the sets A^{(n)} would have been open. Here, they are not open.

Since the a_i^{(n)} \rightarrow 0 as i \rightarrow \infty, there is no open ball centred at the origin, with positive radius \rho, that is contained in A^{(n)}.  Indeed, for a given radius \rho > 0 and n, an i can be found such that a_i^{(n)} < \rho, showing A^{(n)} is not fully contained within the ball.

  • Let v_i = i^{-3/4}. The point v = (v_1,v_2,\cdots) \in l^2 is not in A.

We may be led to believe this is surprising: as n gets bigger, all the sides of the rectangle A^{(n)} get bigger, hence we may expect that it will grow to be the whole of l^2. However, as explained presently, in infinite dimensions, the order in which limits are taken matters, and the above argument is invalid.  It will be seen that while the sides of the rectangle do grow, they grow too slowly.

On the one hand, it is true that if we fix an i, then we can find an n sufficiently large so that the ith side of the rectangle A^{(n)} contains v_i, the ith term of v.  Mathematically, this is true because \lim_{n \rightarrow \infty} a_i^{(n)} = \infty.  However, this is only really telling us that the “truncated” approximations (v_1,0,\cdots), (v_1,v_2,0,\cdots), (v_1,v_2,v_3,0,\cdots),\cdots are all in A, but analytically, we know that if A is not closed then the limit of these approximations need not lie in A. Figuratively speaking, even though an ordinary towel can be stuffed in a suitcase by sequentially pushing in the bits that are hanging out, this reasoning is not valid if the towel were infinitely big; after each push, there may still be an infinite portion of the towel hanging out. Instead, we must think directly of the A^{(n)} as suitcases of increasing size, and ask if the towel fits entirely inside one of these suitcases.

The reason why v does not lie in any of the A^{(n)}, and hence v is not in A, is that the terms v_i of v decrease more slowly to zero than the sides a_i^{(n)} of the rectangles.  For a fixed n, there will be a sufficiently large i so that v_i > a_i^{(n)}, thereby showing v \not\in A^{(n)} for all n.

  • But how can v not be in A if every dimension of the suitcase A^{(n)} is expanding with n?

The sides of the suitcase are aligned with the canonical basis vectors e_i.  However, the e_i form a Schauder basis, not a Hamel basis, so it is not true that “every dimension of the suitcase is expanding”.  For example, the point v does not lie in the span of the e_i.  (The span of the e_i consists of all sequences with only a finite number of non-zero terms.) The argument given earlier readily extends to show that, for all \lambda \neq 0, the point \lambda v does not lie in any of the suitcases A^{(n)}. So while the suitcases are getting bigger in the directions e_i, they always have zero width in the direction v.  (It is true that the distance from A^{(n)} to v goes to zero as n \rightarrow \infty, but that just means v is in the closure of A.)

  • \mu(A+v) = 0.

This is another phenomenon unique to the infinite-dimensional case. Recall that X_i \sim N(0,i^{-3}). The ith side of A^{(n)} + v is the interval from v_i - a_i^{(n)} to v_i + a_i^{(n)}. We know \frac{v_i}{a_i^{(n)}} \rightarrow \infty as i \rightarrow \infty (the sides of the suitcase decrease much faster than the v_i). And v_i normalised by the standard deviation of X_i also goes to infinity, i.e., \frac{v_i}{i^{-3/2}} = i^{3/4} \rightarrow \infty. So the probability that X_i lies inside the interval from v_i - a_i^{(n)} to v_i + a_i^{(n)} goes to zero. While the finite product of positive numbers is positive, the infinite product of positive numbers that decay to zero is zero: there is zero probability that a randomly chosen point will lie in A^{(n)} + v.

  • The sets A and A +v are disjoint.

Again, this is a consequence of the order of the limits.  For a fixed n, the ith edge of the suitcase A^{(n)} decays to zero faster than v_i.  So for i sufficiently large, the ith edges of A^{(n)} and A^{(n)}+v do not overlap, hence A and A +v are disjoint.

Epilogue

While the above gave one specific example, the general theory goes as follows. Given a Gaussian measure N_{0,Q} on l^2 with mean zero and covariance Q, the shifted Gaussian measure N_{v,Q} is either equivalent to N_{0,Q} (meaning the two measures are absolutely continuous with respect to each other), or the two measures are singular. The Cameron-Martin space is the space of all v for which the two measures are equivalent. There is a simple expression for this space: it is Q^{1/2}(l^2), the image of l^2 under the operator Q^{1/2}.

Advertisements