### Archive

Archive for October, 2015

## The Cameron-Martin Space, and Some Differences Between Finite and Infinite Dimensional Spaces

The usefulness of infinite-dimensional vector spaces is that often concepts in finite dimensions extend to infinite dimensions; this allows us to reason about infinite-dimensional problems using finite-dimensional intuition. (This is explained in detail in my primer on RKHS theory, for example.) Nevertheless, infinite-dimensional spaces exhibit behaviours not found in finite dimensions, and working effectively with infinite-dimensional spaces requires understanding how they differ from finite dimensional spaces.

This article discusses several key differences between a Gaussian measure on Euclidean space $\mathbb{R}^n$ and on the separable Hilbert space $l^2$, which can be thought of as an infinite-dimensional generalisation of Euclidean space. (Recall $l^2$ consists of all square-summable sequences.)

## A Gaussian Measure on $l^2$

Let $e_i$ denote the canonical basis vectors of $l^2$, that is, $e_i$ is the sequence of all zeros except for the $i$th term which is unity.

Essentially, we wish to construct a Gaussian random variable $X$ on $l^2$ element-wise by first generating a collection of real-valued independent random variables $X_i \sim N(0,\lambda_i)$ and then setting $X = (X_1,X_2,\cdots)$, or in other words, $X = \sum X_i e_i$. We assume $\lambda_i > 0$.

It is essential for $\lambda_i$ to decay to zero sufficiently fast, for otherwise, the sequence $X$ will have a non-zero probability of not lying in $l^2$.  We require $\sum X_i^2 < \infty$ to hold with probability one.  Since the standard deviation $\sqrt{\lambda_i}$ gives the “typical” size of a random realisation of $N(0,\lambda_i)$, it is not surprising that the requirement be $\sum \lambda_i < \infty$.  (Throughout, curious readers are invited to consult an introductory book on infinite-dimensional analysis for more details. For example, in general, one starts with a linear operator $Q$ that will serve as the covariance matrix, and requires $Q$ to be of trace class.  Here, I am simply taking $Q$ to be the “diagonal matrix” $\mathrm{diag}(\lambda_1,\lambda_2,\cdots)$.)

It turns out that the above procedure can be made rigorous; provided $\sum \lambda_i < \infty$, there is indeed a Gaussian random variable $X$ on $l^2$ such that the $X_i = \langle X, e_i \rangle$ are independent Gaussian random variables $N(0,\lambda_i)$.

To any (Borel-measurable) subset $A \subset l^2$, we define the Gaussian measure $\mu(A)$ as the probability that $X$ lies in $A$.  (To a mathematician, this is putting the cart before the horse, but nevertheless, it is a convenient way to think when it comes to having to evaluate $\mu(A)$ in certain situations.)

Since we insisted all the $\lambda_i$ be positive, the measure $\mu$ is non-zero on any open subset of $l^2$. Note too that $\mu(l^2) = 1$.

## Some Possibly Surprising Facts

Given that the following do not hold in finite dimensions, they are initially counter-intuitive. The remainder of the article aims to give sufficient explanations to “improve” our intuition, thereby making the following facts unsurprising.

• Given a Gaussian measure $\mu$ on $l^2$, there exists a subset $A \subset l^2$ and an element $v \in l^2$ such that
1. $A$ and $A+v$ are disjoint;
2. $\mu(A) = 1$; and
3. $\mu(A+v) = 0$.

Why should these facts be surprising? By analogy with the finite-dimensional case, one would expect that unless $A$ is pretty much all of $l^2$ then it would not be possible for $\mu(A) = 1$. Indeed, $\mu$ is non-zero on any open subset of $l^2$, hence must $A$ be dense in $l^2$? So how can the translated version $A+v$ be disjoint from $A$? And how can the “large” set $A$, having measure 1, go to the “small” set $A+v$ having measure 0, simply by a translation?

## Explanation

For concreteness, choose $\lambda_i = \frac1{i^3}$.

The subset $A$ will be taken to be the limit of a monotonically increasing sequence of subsets $A^{(n)}$. The subsets $A^{(n)}$ will be taken to be “rectangles” of the form $A^{(n)} = \{ (x_1,x_2,\cdots) \in l^2 \mid -a_i^{(n)} < x_i < a_i^{(n)}\}$ where the $a_i$ will be strictly positive.

Since the variance of the $X_i$ goes to zero, we hope to be able to choose the $a_i^{(n)}$ so that they decay to zero in $i$, for fixed $n$, while ensuring $A^{(n)}$ has measure close to unity. This gives us a chance of constructing an $A$ which is not the whole of $l^2$ yet has measure equal to unity. The rate of decay $i^{-3/2}$ is too fast because the probability that $X_i \sim N(0,i^{-3})$ lies between $-i^{-3/2}$ and $i^{-3/2}$ does not depend on $i$; if this probability is $c$ then $c^\infty = 0$ would be the measure of the rectangle. This motivates trying a slower rate of decay: $a_i^{(n)} = \sqrt{2}\frac{n}{i}$. (The numerator $n$ is there to grow the rectangles so the probability of $X$ lying in $A^{(n)}$ goes to unity as $n \rightarrow \infty$, and the $\sqrt{2}$ is for later convenience.)

The probability that $X_i \sim N(0,i^{-3})$ lies between $\pm \sqrt{2}\frac{n}{i}$ is conveniently expressed in terms of the error function as $\mathrm{erf}(n\sqrt{i})$. Hopefully, $\mu(A^{(n)}) = \prod_{i=1}^\infty \mathrm{erf}(n\sqrt{i}) > 0$ for all $n > 0$, and $\prod_{i=1}^\infty \mathrm{erf}(n\sqrt{i}) \rightarrow 1$ as $n \rightarrow \infty$, so that $\mu(A)=1$. This is indeed the case.

[Here is the tedious calculation to prove the claims. Let $c_i = 1-\mathrm{erf}(n \sqrt{i})$. A well-known bound on the complementary error function is $1-\mathrm{erf}(u) = \mathrm{erfc}(u) < \frac1{\sqrt{\pi}u} e^{-u^2}$. Therefore, $c_i < d_i$ where $d_i = \frac1{n\sqrt{\pi i}} e^{-in^2}$. Note $0 < c_i < 1$ and $0 < d_i < 1$ when $n \geq 1$ and $i \geq 1$. Provided $\sum \ln(1-c_i)$ is finite, $\mu(A^{(n)}) = \prod (1-c_i)$ will be strictly positive, as required. Now, $0 > \ln(1-c_i) > \ln(1-d_i) \geq \frac{-d_i}{1-d_i}$, hence it suffices to prove $\sum\frac{d_i}{1-d_i}$ is finite. The ratio test for absolute convergence involves the ratio $\frac{d_{i+1}}{1-d_{i+1}} \frac{1-d_i}{d_i} = \frac{d_{i+1}}{d_i} (1-d_i) \frac1{1-d_{i+1}}$. Since $d_i \rightarrow 0$, it suffices to show $\lim_{i \rightarrow \infty} \frac{d_{i+1}}{d_i} < 1$ in order to conclude that $\sum\frac{d_i}{1-d_i}$ is finite. Now, $\frac{d_{i+1}}{d_i} = \sqrt{\frac{i}{i+1}} \frac{e^{-(i+1)n^2}}{e^{-in^2}} \rightarrow e^{-n^2} < 1$ whenever $n > 0$, as required. To show $\mu(A) = 1$, we need to show $\lim_{n \rightarrow \infty} \sum_i \frac{d_i}{1-d_i} = 0$.  An earlier calculation shows $d_{i+1} < e^{-n^2} d_i$, so that $\sum_i \frac{d_i}{1-d_i} \leq \frac{d_1}{1-d_1}(1+\alpha+\alpha^2+\cdots) = \frac{d_1}{1-d_1} \frac1{1-\alpha}$ where $\alpha = e^{-n^2}$.  It is clear that this can be made arbitrarily close to zero by choosing $n$ large.]

• Had we been in finite dimensions, the sets $A^{(n)}$ would have been open. Here, they are not open.

Since the $a_i^{(n)} \rightarrow 0$ as $i \rightarrow \infty$, there is no open ball centred at the origin, with positive radius $\rho$, that is contained in $A^{(n)}$.  Indeed, for a given radius $\rho > 0$ and $n$, an $i$ can be found such that $a_i^{(n)} < \rho$, showing $A^{(n)}$ is not fully contained within the ball.

• Let $v_i = i^{-3/4}$. The point $v = (v_1,v_2,\cdots) \in l^2$ is not in $A$.

We may be led to believe this is surprising: as $n$ gets bigger, all the sides of the rectangle $A^{(n)}$ get bigger, hence we may expect that it will grow to be the whole of $l^2$. However, as explained presently, in infinite dimensions, the order in which limits are taken matters, and the above argument is invalid.  It will be seen that while the sides of the rectangle do grow, they grow too slowly.

On the one hand, it is true that if we fix an $i$, then we can find an $n$ sufficiently large so that the $i$th side of the rectangle $A^{(n)}$ contains $v_i$, the $i$th term of $v$.  Mathematically, this is true because $\lim_{n \rightarrow \infty} a_i^{(n)} = \infty$.  However, this is only really telling us that the “truncated” approximations $(v_1,0,\cdots), (v_1,v_2,0,\cdots), (v_1,v_2,v_3,0,\cdots),\cdots$ are all in $A$, but analytically, we know that if $A$ is not closed then the limit of these approximations need not lie in $A$. Figuratively speaking, even though an ordinary towel can be stuffed in a suitcase by sequentially pushing in the bits that are hanging out, this reasoning is not valid if the towel were infinitely big; after each push, there may still be an infinite portion of the towel hanging out. Instead, we must think directly of the $A^{(n)}$ as suitcases of increasing size, and ask if the towel fits entirely inside one of these suitcases.

The reason why $v$ does not lie in any of the $A^{(n)}$, and hence $v$ is not in $A$, is that the terms $v_i$ of $v$ decrease more slowly to zero than the sides $a_i^{(n)}$ of the rectangles.  For a fixed $n$, there will be a sufficiently large $i$ so that $v_i > a_i^{(n)}$, thereby showing $v \not\in A^{(n)}$ for all $n$.

• But how can $v$ not be in $A$ if every dimension of the suitcase $A^{(n)}$ is expanding with $n$?

The sides of the suitcase are aligned with the canonical basis vectors $e_i$.  However, the $e_i$ form a Schauder basis, not a Hamel basis, so it is not true that “every dimension of the suitcase is expanding”.  For example, the point $v$ does not lie in the span of the $e_i$.  (The span of the $e_i$ consists of all sequences with only a finite number of non-zero terms.) The argument given earlier readily extends to show that, for all $\lambda \neq 0$, the point $\lambda v$ does not lie in any of the suitcases $A^{(n)}$. So while the suitcases are getting bigger in the directions $e_i$, they always have zero width in the direction $v$.  (It is true that the distance from $A^{(n)}$ to $v$ goes to zero as $n \rightarrow \infty$, but that just means $v$ is in the closure of $A$.)

• $\mu(A+v) = 0$.

This is another phenomenon unique to the infinite-dimensional case. Recall that $X_i \sim N(0,i^{-3})$. The $i$th side of $A^{(n)} + v$ is the interval from $v_i - a_i^{(n)}$ to $v_i + a_i^{(n)}$. We know $\frac{v_i}{a_i^{(n)}} \rightarrow \infty$ as $i \rightarrow \infty$ (the sides of the suitcase decrease much faster than the $v_i$). And $v_i$ normalised by the standard deviation of $X_i$ also goes to infinity, i.e., $\frac{v_i}{i^{-3/2}} = i^{3/4} \rightarrow \infty$. So the probability that $X_i$ lies inside the interval from $v_i - a_i^{(n)}$ to $v_i + a_i^{(n)}$ goes to zero. While the finite product of positive numbers is positive, the infinite product of positive numbers that decay to zero is zero: there is zero probability that a randomly chosen point will lie in $A^{(n)} + v$.

• The sets $A$ and $A +v$ are disjoint.

Again, this is a consequence of the order of the limits.  For a fixed $n$, the $i$th edge of the suitcase $A^{(n)}$ decays to zero faster than $v_i$.  So for $i$ sufficiently large, the $i$th edges of $A^{(n)}$ and $A^{(n)}+v$ do not overlap, hence $A$ and $A +v$ are disjoint.

## Epilogue

While the above gave one specific example, the general theory goes as follows. Given a Gaussian measure $N_{0,Q}$ on $l^2$ with mean zero and covariance $Q$, the shifted Gaussian measure $N_{v,Q}$ is either equivalent to $N_{0,Q}$ (meaning the two measures are absolutely continuous with respect to each other), or the two measures are singular. The Cameron-Martin space is the space of all $v$ for which the two measures are equivalent. There is a simple expression for this space: it is $Q^{1/2}(l^2)$, the image of $l^2$ under the operator $Q^{1/2}$.