### Archive

Archive for October, 2013

## Countable Intersections and Unions of Sets: A Discussion on an Example in Williams “Probability with Martingales”

As homework for students, I recently set the task of understanding the Example/Warning in Section 1.11 of Williams’ book “Probability with Martingales“. I provide here my own ‘solution’ to this homework exercise. There is no need to refer to Williams’ book as the following is self-contained.

Let $V = \{v_1,v_2,\cdots\}$ be the set of rational numbers between 0 and 1 inclusive; because the rational numbers are countable, they can be enumerated as $v_1,v_2,\cdots$. Define the sets $I_{n,k} = (v_n - \frac1{k2^n},v_n+\frac1{k2^n}) \cap [0,1]$. Here, $(v_n-\frac1{k2^n},v_n+\frac1{k2^n})$ is the open interval of width $\frac1{k2^{n-1}}$ centred at the rational number $v_n$, and intersecting it with the closed interval $[0,1]$ simply eliminates any “end effects” by keeping all sets within $[0,1] \subset \mathbb{R}$. For a fixed $n$, note that $I_{n,k}$ monotonically decreases to $\{v_n\}$ as $k \rightarrow \infty$. In particular, $\cup_{n=1}^\infty \cap_{k=1}^\infty I_{n,k} = V$.

Of interest is what the set $H = \cap_{k=1}^\infty \cup_{n=1}^\infty I_{n,k}$ looks like. A straightforward calculation shows $H \supset V$, but do you think $H = V$, or $H = [0,1]$, or something else?  Is $\sqrt{2}-1 \in H$?

For any finite $N$ and $K$$\cap_{k=1}^K \cup_{n=1}^N I_{n,k} = \cup_{n=1}^N \cap_{k=1}^K I_{n,k}$.  Believing this to continue to hold in the limit as $N,K \rightarrow \infty$ is a flawed argument for $H=V$.  (Note: normally, union and intersection cannot be interchanged, but here, the sets $I_{n,k}$ are decreasing in $k$. In particular, $\cap_{k=1}^K I_{n,k} = I_{n,K}$.) To see the difference between the finite and infinite cases, note that $x$ being an element of $\cap_{k=1}^K \cup_{n=1}^N I_{n,k}$ means precisely that $\forall k$, $\exists n_k$ such that $x \in I_{n_k,k}$.  Here, a subscript $k$ adorns $n$ to emphasise that the choice of $n$ is allowed to depend on $k$. That $x$ is an element of $\cup_{n=1}^N \cap_{k=1}^K I_{n,k}$ means precisely that $\exists n$, $\forall k$, $x \in I_{n,k}$.  The key difference is that here, a single $n$ must work for all $k$.  In the finite case, the choice $n_k = n_K$ works for all $k$. If $K$ is infinite though, there is no such thing as $n_\infty$ that could serve in place of $n_K$.

A flawed argument for $H = [0,1]$ can be based on the denseness of rationals: every $x \in [0,1]$ is the limit of a sequence of rational numbers, and since $\cup_n I_{n,k}$ contains not only every rational number, but also a neighbourhood of every rational number, it “must” therefore contain all numbers in $[0,1]$. It is insightful to understand where the flaw is.  Let $x \in [0,1]$ be an irrational number. Let $i_1 < i_2 < \cdots$ be an increasing sequence of indices such that $v_{i_1},v_{i_2},\cdots \rightarrow x$. Whether $x \in \cap_k \cup_n I_{i_n,k}$ depends on which is faster: the convergence of $v_{i_n} \rightarrow x$, or the convergence of $\frac1{k2^{i_n}} \rightarrow 0$. To illustrate, assume that $|v_{i_n} - x | = \frac1{2^n}$. (For pedants, interpret this equality as approximate equality; the difference between a rational and irrational number cannot exactly equal a rational number.) Then even though $v_{i_n} \rightarrow x$, the convergence is sufficiently slow that $x \notin I_{i_n,k}$ for all $k$. (Note $2^n \geq 2^{i_n}$ because $i_n \geq n$.)

The pendulum swings back: perhaps then $H = V$ because no sequence can converge sufficiently quickly to an irrational number? To demonstrate that this is false, consider the following procedure for choosing $v_1,v_2,\cdots$ to ensure that $\sqrt{2}-1 \in H$. It suffices to construct a sequence $v_1,v_3,v_5,\cdots$ that converges quickly to $\sqrt{2}-1 \in H$ because the remaining rational numbers can be assigned (in arbitrary order) to $v_2,v_4,v_6,\cdots$.  Simply choose $v_n$, for $n$ odd, to be a rational number within $\frac1{n2^n}$ of $\sqrt{2}-1$.  (This can always be arranged by taking sufficiently many terms in the decimal expansion $0.414214\cdots$ of $\sqrt2-1$.) Then, no matter how large $k$ is chosen, $\sqrt2-1 \in I_{n,k}$ when $n$ is an odd number greater than $k$, hence $\sqrt2-1 \in H$.

The above has already demonstrated that limits cannot be interchanged; a sequence $v_1,v_2,\cdots$ of rational numbers was constructed such that $\sqrt2-1$ is in $H$ and hence $H \neq V$. This is not the argument used by Williams, but being constructive in nature might mean it is a more transparent argument.

The higher-level argument in Williams’ book reveals more information about $H$.  Williams uses the Baire category theorem to deduce that $H$ must be uncountable, hence $H \supsetneq V$ regardless of how $v_1,v_2,\cdots$ are chosen. Furthermore, the Lebesgue measure of $H$ is easily shown to be zero, therefore, $H$ does not contain any open intervals, and in particular $H \subsetneq [0,1]$. (Proof: the Lebesgue measure of $I_{n,k}$ is at most $\frac1{k2^{n-1}}$.  Since $\sum_{n=1}^\infty \frac1{2^{n-1}} = 2$, the Lebesgue measure of $\cup_n I_{n,k}$ is at most $\frac2k$.  As $k \rightarrow \infty$, this goes to zero, hence the measure of $H$ must be zero.)

For completeness, the fact that $H$ is uncountable will be derived. The trick is to consider the complement $H^c = [0,1] \setminus H$. If it can be shown that the complement $H^c$ is “too small” then $H$ must be uncountable.  De Morgan’s law implies that $H^c = \cup_{k=1}^\infty \left( \cup_{n=1}^\infty I_{n,k} \right)^c$.   (Exercise: prove from first principles that $\left( \cap_{k=1}^\infty G_k \right)^c = \cup_{k=1}^\infty G_k^c$.)  Let $F_n = \left( \cup_{n=1}^\infty I_{n,k} \right)^c$.  Since $\cup_{n=1}^\infty I_{n,k}$ contains all the rational numbers, $F_n$ can contain only irrational numbers.  In particular, $F_n$ does not contain any open interval. Each $I_{n,k}$ is open and hence $\cup_{n=1}^\infty I_{n,k}$ is open. Thus, $F_n$ is a closed set not containing any open intervals. Such sets are called nowhere dense. Since $H^c$ is the countable union of nowhere dense sets, it is a meagre set. If $H$ is countable then it too would be meagre, and hence $H \cup H^c$ would be meagre. Yet $H \cup H^c = [0,1]$, and $[0,1]$ is a Baire space, meaning meagre sets are “small”.  In particular, the whole space $[0,1]$ cannot be meagre, a contradiction.  Therefore, $H$ must be uncountable.

The conclusion then is that $H$ depends on the actual ordering $v_1,v_2,\cdots$ of the rationals, yet regardless of how the rationals are ordered, $H$ will be an uncountable set of Lebesgue measure zero. In particular, $V \subsetneq H \subsetneq [0,1]$.

## How the Jacobian can Detect Cusps and Self-intersections

Let $M$ be a subset of the plane defined by the vanishing of a smooth (that is, derivatives of all orders exist) function $f\colon \mathbb{R}^2 \rightarrow \mathbb{R}$.  For simplicity, it is assumed throughout that $f$ is not identically zero.

For example, if $f(x,y) = x^2 + y^2 - 1$ then the set $M = \{ (x,y) \in \mathbb{R}^2 \mid f(x,y) = 0\}$ is the unit circle. This can be readily visualised using the online Sage Notebook. Sign in to Sage, create a new worksheet, and type: var(‘x,y’); implicit_plot(x^2+y^2-1==0, (x, -2, 2), (y, -2, 2)); NOTE: WordPress automatically converts the first apostrophe in var(‘x,y’); from the correct ${}'$ to the incorrect `.

Next, try: var(‘x,y’); implicit_plot(x^3+y^3-x*y==0, (x, -2, 2), (y, -2, 2));

Note that this second example has a self-intersection; if one considers starting at one end of the line and walking to the other end, there will be a point that will be passed over twice, once when entering the loop and once upon exit.

Thirdly, try: var(‘x,y’); implicit_plot(x^3-y^2==0, (x, -2, 2), (y, -2, 2));

This third example has a cusp; if one endeavours to drive a car from the bottom end of the line to the top end, one will get stuck at the cusp with the car pointing to the left but the “road” continuing off to the right.

Roughly speaking, and as a good introduction to differential geometry, a (concrete) manifold is a subset of $\mathbb{R}^n$ that does not contain cusps (that is, it is smooth) and does not contain self-intersections (that is, it is locally Euclidean). Regardless, it is interesting to ask how difficult it is to determine if the set $M = \{ (x,y) \in \mathbb{R}^2 \mid f(x,y) = 0\}$  has any cusps or self-intersections.  The intuitive principle of driving a car along the curve $M$ to detect cusps or self-intersections may appear difficult to translate into a mathematical tool, especially for detecting self-intersections.

Note though that driving a car corresponds to a parametrisation of the curve: roughly speaking, constructing functions $x(t)$ and $y(t)$ such that $f(x(t),y(t)) = 0$ for all $t$. Finding a self-intersection is equivalent to finding a non-trivial ($t \neq t'$) solution to $x(t)=x(t')$ and $y(t)=y(t')$.  This is a difficult “global” problem. However, the curve has been specified implicitly by $f(x,y) =0$, and perhaps the problem is not global but only local with respect to this implicit formulation?

Detecting cusps and self-intersections really is only a local problem with respect to $f(x,y)$, meaning that knowledge of what $M = \{ (x,y) \in \mathbb{R}^2 \mid f(x,y) = 0\}$ looks like anywhere except in a small neighbourhood of a point $(x,y)$ is irrelevant to determining whether there is a singularity or self-intersection at $(x,y)$.  This suggests considering a Taylor series of $f$ about the point $(x,y)$ of interest. [Technically, we would also want $f$ to be analytic if we are relying on a Taylor series argument, but we are focusing here on gaining a feel for the problem in the first instance.] It is also worth noting that this is also an example of “unhelpful intuition”; thinking of driving cars is a hindrance, not a help.

Choose a point $(x',y')$ lying on $M$, that is, $f(x',y') = 0$.  Then $f(x,y) = \alpha (x-x') + \beta (y-y') + \cdots$ where the dots denote higher-order terms.  If either $\alpha$ or $\beta$ is non-zero then $f$ contains a linear term, and the linear term dominates the behaviour of $f$ when $(x,y)$ is sufficiently close to $(x',y')$.  The linear equation $\alpha (x-x') + \beta (y-y') = 0$ describes a straight line; there can be no cusps or self-intersections.  This argument can be made rigorous using the implicit function theorem, even when $f$ is not analytic: if for every point $(x,y) \in M$, either $\frac{\partial f}{\partial x}$ or $\frac{\partial f}{\partial y}$ is non-zero at $(x,y)$, then $M = \{ (x,y) \in \mathbb{R}^2 \mid f(x,y) = 0\}$ contains no self-intersections or cusps.  This result, stated more generally in terms of the rank of the Jacobian matrix, appears near the beginning in most differential geometry textbooks.  Without the above considerations though, it may appear mysterious how the Jacobian can detect cusps and self-intersections. (The key point is not to think in terms of a parametrisation of the curve, but instead, break the space $\mathbb{R}^2$ into neighbourhoods $U_i$  which are sufficiently small that it is possible to determine exactly what $M \cap U_i$ looks like. To belabour the point, if a point in $U_i$ is not in $M \cap U_i$ then it is not in $M$.)

Textbooks generally do not mention explicitly though that if $M$ fails this “Jacobian test” then it may still be a manifold. The above makes this clear; if $\alpha=\beta=0$ then it is necessary to examine higher-order terms before making any claims. As a simple example, $f(x,y) = x^3$ fails the test at $(0,0)$ because both $\frac{\partial f}{\partial x}$ and $\frac{\partial f}{\partial y}$ are zero at the origin, yet $M = \{ (x,y) \in \mathbb{R}^2 \mid x^3 = 0\}$ is the straight line given equivalently by $x=0$.

## Long versus Short Proofs

Proofs are very similar to computer programs. And just like computer programs, there are often many different ways of writing a proof.  Sometimes the difference between proofs is great; one might be based on geometry while another on analysis. The purpose of this short note is to emphasise that different proofs are possible at the “simple” level too.  While this may appear mundane, it is actually an important part of learning mathematics: once you have proved a result, spend a few minutes trying to simplify the proof. It will make the next proof you do easier. The two key messages are as follows.

• There is no substitute for experience when it comes to deriving proofs.
• Practicing on simple examples to find the “best” proof is training for being able even just to find a proof of a harder result.

For lack of a better on-the-spot example, consider the following problem. Let $f\colon\mathbb{C} \rightarrow \mathbb{C}$ be an analytic function having a double zero at the origin and whose first derivative has at most linear growth: $| f'(x) | \leq \alpha |x|$ for some $\alpha \in \mathbb{R}$.  What is the most general form of $f$?

## First Approach

A double zero at the origin means $f(x) = x^2 h(x)$ for some analytic function $h$. Therefore $f'(x) = 2xh(x) + x^2h'(x)$ and $\frac{f'(x)}{x} = 2h(x) + xh'(x)$.  Since $latex 2h(x) + xh'(x)$ is both analytic and bounded, it must be constant, by Liouville’s theorem.  (Here, all analytic functions are entire because they are defined on the whole of the complex plane.) Solving the differential equation $latex 2h(x) + xh'(x) = c$ by substituting in the power series expansion $h(x) = \sum a_i x^i$ shows that $h(x) = \frac{c}2$. [The general solution of $latex 2h(x) + xh'(x) = 0$ is $h(x) = \beta x^{-2}$ where $\beta$ is an arbitrary scalar. The only general solution that is analytic is $h(x)=0$.] The conclusion is that the most general form of $f$ is $f(x) = ax^2$ for some complex scalar $a$.

## Second Approach

The first approach is unattractive and unnecessarily complicated. Instead of starting with the double zero, start instead with the first derivative having a linear bound. (A standard generalisation of Liouville’s theorem is that an entire function with a linear bound must itself be linear. We will pretend here we do not know this fact.) If $f(x)$ has a double zero at the origin then $f'(x)$ has a zero at the origin, therefore $f'(x) = x g(x)$ for some analytic $g(x)$. The linear bound together with Liouville’s theorem means $g(x)$ is constant, that is $f'(x) = 2 a x$ for some scalar $a$. Therefore $f(x)$ must equal $ax^2$ if it is to satisfy both $f'(x) = 2 a x$ and $f(0)=0$.

## What was the Difference?

The first approach expressed $f$ as $f(x) = x^2 h(x)$ while the second approach expressed $f'$ as $f'(x) = x g(x)$. Both approaches resulted in a differential equation, but the second approach resulted in the simpler differential equation $f'(x) = 2ax$. Underlying this example is that a “change of coordinates” can simplify a differential equation. Although both approaches could be made to work in this simple example, there are situations where some approaches are too difficult to follow through to completion.

One could argue that because the linear bound constraint is “harder” than the double-zero constraint, one should start with the linear bound constraint and not the double-zero constraint, and therefore be led to the simpler differential equation. Yet the real messages are as stated at the start:

• There is no substitute for experience when it comes to deriving proofs.
• Practicing on simple examples to find the “best” proof is training for being able even just to find a proof of a harder result.