Author Archive

## Poles, ROCs and Z-Transforms

Why should all the poles be inside the unit circle for a discrete-time digital system to be stable? And why must we care about regions of convergence (ROC)? With only a small investment in time, it is possible to gain a very clear understanding of exactly what is going on — it is not complicated if learnt step at a time, and there are not many steps.

## Step 1: Formal Laurent Series

Despite the fancy name, Step 1 is straightforward. Digital systems operate on sequences of numbers: a sequence of numbers goes into a system and a sequence of numbers comes out. For example, if $x[n]$ denotes an input sequence then an example of a discrete-time system is $y[n] = x[n] + \frac12 x[n-1]$.

How can a specific input sequence be written down, and how can the output sequence be computed?

The direct approach is to specify each individual $x[n]$, for example, $x[n] = 0$ for $n < 0$, $x[0] = 1$, $x[1] = 2$, $x[2] = 3$ and $x[n] = 0$ for $n > 2$. Direct substitution then determines the output: $y[0] = x[0] + \frac12 x[-1] = 1 + \frac12 \cdot 0 = 1$, $y[1] = x[1] + \frac12 x[0] = 2 + \frac12 \cdot 1 = 2\frac12$ and so forth.

The system $y[n] = x[n] + \frac12 x[n-1]$ has a number of nice properties, two of which are linearity and time invariance. This is often expressed in textbooks by saying the system is LTI, where the L in LTI stands for Linear and the TI stands for Time Invariant. When first learning about discrete-time digital systems, usually attention is restricted to linear time-invariant systems, and indeed, it is for such systems that the Z-transform proves most useful.

Any LTI system has the property that the output is the convolution of the input with the impulse response. For example, the impulse response of $y[n] = x[n] + \frac12 x[n-1]$ is found by injecting the input $x[0] = 1$ and $x[n] = 0$ for $|n| \geq 1$. This impulse as input produces the output $y[0] = 1$, $y[1] = \frac12$ and $y[n] = 0$ for $n \notin \{0,1\}$. It is common to denote the impulse response using the letter $h$, that is, $h[0] = 1$, $h[1] = \frac12$ and $h[n] = 0$ for $n \notin \{0,1\}$.

By the definition of convolution, the convolution of a sequence $x[n]$ with a sequence $h[n]$ is $c[n] = \sum_{m=-\infty}^{\infty} h[m] x[n-m]$. If $h[0] = 1$, $h[1] = \frac12$ and $h[n] = 0$ for $n \notin \{0,1\}$ then $c[n] = x[n] + \frac12 x[n-1]$ and this is equal to $y[n]$; we have verified in this particular case that the output of the original system is indeed given by convolving the input with the impulse response.

Let’s be honest: writing $h[0] = 1$, $h[1] = \frac12$ and $h[n] = 0$ for $n \notin \{0,1\}$ is tedious! An alternative is to use a formal Laurent series to represent a sequence. Here, “formal” means that we do not try to evaluate this series! A Laurent series just means instead of limiting ourselves to polynomials in a dummy variable $t$, we allow negative powers of $t$, and we allow an infinite number of non-zero terms. (The variable $t$ has nothing to do with time; it will be replaced in Step 3 by $z^{-1}$ where it will be referred to as the Z-transform. To emphasise, $t$ is just a dummy variable, a placeholder.)

Precisely, the sequence $h[0] = 1$, $h[1] = \frac12$ and $h[n] = 0$ for $n \notin \{0,1\}$ can be encoded as $1 + \frac12 t$; isn’t that much more convenient to write? In general, an arbitrary sequence $h[n]$ can be encoded as the formal series $\sum_{n=-\infty}^{\infty} h[n] t^n$. The sole purpose of the $t^n$ term is as a placeholder for placing the coefficient $h[n]$. In other words, if we are given the formal series $2 t^{-5} + 3 + t^7$ then we immediately know that that is our secret code for the sequence that consists of all zeros except for the -5th term which is 2, the 0th term which is 3 and the 7th term which is 1.

Being able to associate letters with encoded sequences is convenient, hence people will write $x(t) = 2 t^{-5} + 3 + t^7$ to mean $x[-5] = 2$, $x[0] = 3$, $x[7] = 1$ and $x[n] = 0$ for $n \notin \{-5,0,7\}$. Do not try to evaluate $x(t)$ for some value of $t$ though – that would make no sense at this stage. (Mathematically, we are dealing with a commutative ring that involves an indeterminant $t$. If you are super-curious, see Formal Laurent series.)

Now, there is an added benefit! If the input to our system is $x(t) = 1 + 2t + 3t^2$ and the impulse response is $h(t) = 1 + \frac12 t$ then there is a “simple” way to determine the output: just multiply these two series together! The multiplication rule for formal Laurent series is equivalent to the convolution rule for two sequences. You can check that $y(t) = h(t)x(t) = (1+\frac12 t) (1 + 2t + 3t^2) = 1 + 2t + 3t^2 + \frac12 t + t^2 + \frac32 t^3 = 1 + 2\frac12 t + 4 t^2 + \frac32 t^3$ (corresponding to $y[0]=1, y[1]=2\frac12, y[2] = 4$ etc) really is the output of our system if the input were $x[n] = 0$ for $n < 0$, $x[0] = 1$, $x[1] = 2$, $x[2] = 3$ and $x[n] = 0$ for $n > 2$.

• A formal Laurent series provides a convenient encoding of a discrete-time sequence.
• Convolution of sequences corresponds to multiplication of formal Laurent series.
• Do not think of formal Laurent series as functions of a variable. (They are elements of a commutative ring.) Just treat the variable as a placeholder.
• In particular, there is no need (and it makes no sense) to talk about ROC (regions of convergence) if we restrict ourselves to formal Laurent series.

## Step 2: Analytic Functions

This is where the fun begins. We were told not to treat a formal Laurent series as a function, but it is very tempting to do so… so let’s do just that! 🙂 The price we pay though is that we do need to start worrying about ROC (regions of convergence) for otherwise we could end up with incorrect answers.

To motivate further our desire to break the rules, consider the LTI system $y[n] = \frac12 y[n-1] + x[n]$. Such a system has “memory” or an “internal state” because the current output $y[n]$ depends on a previous output $y[n-1]$ and hence the system must somehow remember previous values of the output. We can assume that in the distant past the system has been “reset to zero” and hence the output will be zero up until the first time the input is non-zero. The impulse response is determined by setting $x[0]$ to 1 and observing the output: $h[0] = 1$, $h[1] = \frac12$, $h[2] = \frac14$ and in general $h[n] = \frac1{2^n}$ for $n \geq 0$. Encoding this using a formal Laurent series gives the alternative representation $h(t) = \sum_{n=0}^\infty \frac1{2^n} t^n = 1 + \frac12 t + \frac14 t^2 + \cdots$.

There is a more “efficient” representation of $h(t) = \sum_{n=0}^\infty \frac1{2^n} t^n$ based on the Taylor series expansion $\frac1{1-\alpha t} = 1 + \alpha t + \alpha^2 t^2 + \alpha^3 t^3 + \cdots$. Indeed, if we were to break the rules and treat $h(t)$ as a function, we might be tempted to write $h(t) = \frac1{1 - \frac12 t}$ instead of $h(t) = \sum_{n=0}^\infty \frac1{2^n} t^n$. It turns out that, with some care, it is mathematically legitimate to do this. (Furthermore, Step 3 will explain why it is beneficial to go to this extra trouble.)

We know we can encode a sequence using a formal Laurent series, and we can reverse this operation to recover the sequence from the formal Laurent series. In Step 2 then, we just have to consider when we can encode a formal Laurent series as some type (what type?) of function, meaning in particular that it is possible to determine uniquely the formal Laurent series given the function.

Power series (and Laurent series) are studied in complex analysis: recall that every power series has a (possibly zero) radius of convergence, and within that radius of convergence, a power series can be evaluated. Furthermore, within the radius of convergence, a power series (with real or complex coefficients) defines a complex analytic function. The basic idea is that a formal Laurent series might (depending on how quickly the coefficients die out to zero) represent an analytic function on a part of the complex plane.

If the formal Laurent series only contains non-negative powers of $t$ then it is a power series $f(t) = \sum_{n=0}^\infty \alpha_n t^n$ and from complex analysis we know that there exists an $R$, the radius of convergence (which is possibly zero or possibly infinite), such that the sum converges if $|t| < R$ and the sum diverges if $|t| > R$. In particular, $f(t)$ is an analytic function on the open disk $\{ t \in \mathbb{C} \mid |t| < R\}$.

If the formal Laurent series only contains non-positive powers, that is, $f(t) = \sum_{n=-\infty}^0 \alpha_n t^n$, then we can consider it to be a power series in $t^{-1}$ and apply the above result. Since $f(t) = \sum_{n=0}^\infty \alpha_{-n} \left(\frac1t\right)^n$, there exists a $\tilde R$ (possibly zero or possibly infinite) such that $f(t)$ is an analytic function on $\{ t \in \mathbb{C} \mid |t| > \tilde R\}$. [From above, the condition would be $|t^{-1}| < R$ which is equivalent to $|t| > \tilde R^{-1}$ hence defining $\tilde R = R^{-1}$ verifies the claim.]

In the general case, a formal Laurent series decomposes as $\sum_{n=-\infty}^{-1} \alpha_n t^n + \sum_{n=0}^\infty \alpha_n t^n$. Both sums must converge if it is to define an analytic function, hence in the general case, a formal Laurent series defines an analytic function on a domain of the form $\{ t \in \mathbb{C} \mid \tilde R < |t| < R\}$.

The encoding of a sequence as an analytic function is therefore straightforward in principle: given a formal Laurent series $f(t) = \sum_{n=-\infty}^\infty \alpha_n t^n$, determine the constants $\tilde R$ and $R$, and provided the region $\{ t \in \mathbb{C} \mid \tilde R < |t| < R\}$ is non-empty, we can use the analytic function $f(t)$ to encode the sequence $\alpha_n$. We must specify the ROC $\{ t \in \mathbb{C} \mid \tilde R < |t| < R\}$ together with $f(t)$ whenever we wish to define a sequence in this way; without knowing the ROC, we do not know the domain of definition of $f(t)$, that is, we would not know for what values of $t$ does $f(t)$ describe the sequence we want. It is possible for a function $f(t)$ to describe different sequences depending on the ROC! An example of this is now given.

If $f(t) = \sum_{n=0}^\infty t^n$ then we have seen that $f(t) = \frac1{1-t}$. But wait! This is not entirely true. If $|t| < 1$ then certainly $\sum_{n=0}^\infty t^n = \frac1{1-t}$.Yet if $|t| > 1$ then $\sum_{n=0}^\infty t^n = \infty \neq \frac1{1-t}$. The function $t \mapsto \frac1{1-t}$ for $|t| > 1$ does not describe the series $\sum_{n=0}^\infty t^n$. It looks like a perfectly good function though, so what series does it describe?

The region $|t| > 1$ is unbounded. (In fact, it is an open disc centred at the point at infinity.) This motivates us to replace $t$ by $\tau = t^{-1}$ so that the domain becomes $|\tau| < 1$ and we can attempt looking for a power series in $\tau$. Precisely, $\frac1{1-t} = \frac1{1-\tau^{-1}} = \frac{-\tau}{1-\tau} = -\tau \sum_{n=0}^\infty \tau^n = -\sum_{n=1}^\infty \tau^n = \sum_{n=-\infty}^{-1} (-1) t^n$. Therefore, the single function $t \mapsto \frac1{1-t}$ actually encodes two different series depending on whether we treat it as a function on $|t|<1$ or a function on $|t|>1$.

Readers remembering their complex analysis will not find this bizarre because a holomorphic (i.e., complex differentiable) function generally requires more than one power series to represent it. A power series about a point is only valid up until a pole is encountered, after which another point must be chosen and a power series around that point used to continue the description of the function. When we change points, the coefficients of the power series will generally change. In the above example, the first series $\sum_{n=0}^\infty t^n$ is a power series around the origin while the second series $\sum_{n=-\infty}^{-1} (-1) t^n$ is a power series around the point at infinity and therefore naturally has different coefficients. (Thankfully there are only these two possibilities to worry about: a power series about a different point $c \in \mathbb{C}$ would look like $\sum_{n=0}^\infty \alpha_n (t-c)^n$ and is not of the form we described in Step 1. Only when $c=0$ or $c = \infty$ does the power series match up with the formal Laurent series in Step 1.)

While it might seem that introducing analytic functions is an unnecessary complication, it actually makes certain calculations simpler! Such a phenomenon happened in Step 1: we discovered convolution of sequences became multiplication of Laurent series (and usually multiplication is more convenient than convolution). In Step 3 we will see how the impulse response of $y[n] = \frac12 y[n-1] + x[n]$ can be written down immediately.

## Step 3: The Z-transform

Define $y(t) = \sum_{n=-\infty}^\infty y[n] t^n$ and $x(t) = \sum_{n=-\infty}^\infty x[n] t^n$. Clearly, the constraint $y[n] = \frac12 y[n-1] + x[n]$ implies a relationship between $x(t)$ and $y(t)$, but how can we determine this relationship?

Note that $y[n] = \frac12 y[n-1] + x[n]$ is actually an infinite number of constraints, one for each $n$. The first step is to think of these as a single constraint on the whole sequences $\{\cdots,x[-2],x[-1],x[0],x[1],x[2],\cdots\}$ and $\{\cdots,y[-2],y[-1],y[0],y[1],y[2],\cdots\}$. We can do this either intuitively or rigorously, arriving at the same answer either way.

Intuitively, think of a table (my lack of WordPress skills prevents me from illustrating this nicely). The top row looks like $\cdots \mid y[-1] \mid y[0] \mid y[1] \mid \cdots$. The next row looks like $\cdots \mid \frac12 y[-2] \mid \frac12 y[-1] \mid \frac12 y[0] \mid \cdots$. The last row looks like $\cdots \mid x[-1] \mid x[0] \mid x[1] \mid \cdots$. Stack these three rows on top of each other so they line up properly: the purpose of the table is to line everything up correctly! Then another way of expressing $y[n] = \frac12 y[n-1] + x[n]$ is by saying that the first row of the table is equal to the second row of the table plus the third row of the table, where rows are to be added elementwise.

Rigorously, what has just happened is that we are treating a sequence as a vector in an infinite-dimensional vector space: just like $( y[-1], y[0], y[1] )$ is a vector in $\mathbb{R}^3$, we can think of $( \cdots, y[-1], y[0], y[1], \cdots )$ as a vector in $\mathbb{R}^\infty$. Each of the three rows of the table described above is simply a vector.

To be able to write the table compactly in vector form, we need some way of going from the vector $( \cdots, y[-1], y[0], y[1], \cdots )$ to the shifted-one-place-to-the-right version of it, namely $( \cdots, y[-2], y[-1], y[0], \cdots )$. In linear algebra, we know that a matrix can be used to map one vector to another vector provided the operation is linear, and in fact, shifting a vector is indeed a linear operation. In abstract terms, there exists a linear operator $S \colon \mathbb{R}^\infty \rightarrow \mathbb{R}^\infty$ that shifts a sequence one place to the right: $S(\ ( \cdots, y[-1], y[0], y[1], \cdots ) \ ) = ( \cdots, y[-2], y[-1], y[0], \cdots )$.

Letting $\mathbf{x}$ and $\mathbf{y}$ denote the vectors $( \cdots, x[-1], x[0], x[1], \cdots )$ and $( \cdots, y[-1], y[0], y[1], \cdots )$ respectively, we can rigorously write our system as $\mathbf{y} = \frac12 S \mathbf{y} + \mathbf{x}$. This is precisely the same as what we did when we said the first row of the table is equal to the second row of the table plus the third row of the table.

Our natural instinct is to collect like terms: if $I$ denotes the identity operator (think of a matrix with ones on the diagonal), so that $I\mathbf{y} = \mathbf{y}$, then $\mathbf{y} = \frac12 S \mathbf{y} + \mathbf{x}$ is equivalent to $\left( I - \frac12 S \right) \mathbf{y} = \mathbf{x}$, that is, $\mathbf{y} = \left( I - \frac12 S \right)^{-1} \mathbf{x}$. This equation tells us the output $\mathbf{y}$ as a function of the input $\mathbf{x}$ — but how useful is it? (Does the inverse $\left( I - \frac12 S \right)^{-1}$ exist, and even if it does, how can we evaluate it?)

While in some cases the expression $\mathbf{y} = \left( I - \frac12 S \right)^{-1} \mathbf{x}$ might actually be useful, often it is easier to use series rather than vectors to represent sequences: we will see that the linear operator $S$ is equivalent to multiplication by $t$, which is very convenient to work with!

Precisely, since $y(t) = \sum_{n=-\infty}^\infty y[n] t^n$ and $\mathbf{y} = ( \cdots, y[-1], y[0], y[1], \cdots )$ represent exactly the same sequence, it should be clear that the equation $\mathbf{y} = \frac12 S \mathbf{y} + \mathbf{x}$ is equivalent to the equation $y(t) = \frac12 t\,y(t) + x(t)$. Indeed, returning to the table, $y(t)$ corresponds to the first row, $x(t)$ corresponds to the last row, and $\frac12 t\,y(t) = \frac12 t \sum_{n=-\infty}^\infty y[n] t^n = \sum_{n=-\infty}^\infty \frac12 y[n-1] t^n$ corresponds to the second row.

Now, from Step 2, we know that we can think of $x(t)$ and $y(t)$ as analytic functions (provided we are careful about the ROC). Therefore, we feel entitled to continue manipulating $y(t) = \frac12 t\,y(t) + x(t)$ to obtain $y(t) = \frac1{1-\frac12t} x(t)$. Since by definition the impulse response $h(t)$ satisfies $y(t) = h(t) x(t)$, we seem to have immediately found the impulse response $h(t) = \frac1{1-\frac12t}$ of the original system; at the very least, it agrees with the longer calculation performed at the start of Step 2.

Remark: Actually, the only way to justify rigorously that the above manipulation is valid is to check the answer we have found really is the correct answer. Indeed, to be allowed to perform the manipulations we must assume the existence of a domain on which both $x(t)$ and $y(t)$ are analytic. If we assume $y(t)$ is “nice” then, under that assumption, prove that $y(t)$ is “nice”, that does not mean $y(t)$ is nice. It is like saying “if it is raining then we can show it really is raining, therefore it must be raining right now!”. A rigorous justification would look something like the following. First we must make some assumption about the input sequence for otherwise we have no idea what ROC to use. If we are interested in inputs that are uniformly bounded and which are zero up until time 0 then we can safely assume that $x(t)$ is analytic on $|t| < 1$. Since $1-\frac12t$ is non-zero whenever $|t| < 1$, we know $h(t) = \frac1{1-\frac12t}$ is analytic on $|t| < 1$. Therefore $y(t) = h(t) x(t)$ will be analytic on $|t| < 1$. This means that $y(t)$ can be expanded as a power series in a neighbourhood of the origin, and the coefficients of that power series are what we believe the output of the system will be. To check this really is the output of the system, it suffices to show $y(t) = \frac12 t\,y(t) + x(t)$ for $|t| < 1$. This is straightforward: $\frac12 t\, y(t) + x(t) = \left(\frac12 t\, h(t) + 1\right) x(t) = \left( \frac{\frac12 t}{1-\frac12t} + \frac{1-\frac12 t}{1-\frac12 t}\right) x(t) = \frac1{1-\frac12t} x(t) = h(t)x(t) = y(t)$ as required, where every manipulation can be seen to be valid for $|t| < 1$ (there are no divisions by zero or other bad things occurring).

The remark above shows it is (straightforward but) tedious to verify rigorously that we are allowed to perform the manipulations we want. It is much simpler to go the other way and define a system directly in terms of $h(t)$ and its ROC. Then, provided the ROC of $x(t)$ overlaps with the ROC of $h(t)$, the output $y(t)$ is given by $h(t)x(t)$ on the overlap, which can be correctly expanded as a Laurent series with regard to the ROC, and the output sequence read off.

All that remains is to introduce the Z-transform and explain why engineers treat $z^{-1}$ as a “time shift operator”.

The Z-transform is simply doing the same as what we have been doing, but using $z^{-1}$ instead of $t$. Why $z^{-1}$ rather than $z$? Just convention (and perhaps a bit of convenience too, e.g., it leads to stable systems being those with all poles inside the unit circle, rather than outside). In fact, sometimes $z$ is used instead of $z^{-1}$, hence you should always check what conventions an author or lecturer has decided to adopt.

The rule that engineers are taught is that when you “take the Z-transform” of $y[n] = \frac12 y[n-1] + x[n]$ then you replace $y[n]$ by $Y(z)$, $x(n)$ by $X(z)$ and $y[n-1]$ by $z^{-1} Y(z)$. The reason this rule works was justified at great length above: recall that as an intermediate mental step we can think of the input and output as vectors, and this led to thinking of them instead as series, because multiplication by $t = z^{-1}$ will then shift the sequence one place to the right. Thus, $Y(z) = \frac12 z^{-1} Y(z) + X(z)$ is asserting that the sequence $y[n]$ is equal to the sequence $x[n]$ plus a half times the sequence $y[n]$ shifted to the right by one place, which is equivalent to the original description $y[n] = \frac12 y[n-1] + x[n]$.

## Step 4: Poles and Zeros

A straightforward but nonetheless rich class of LTI systems can be written in the form $y[n] = a_1 y[n-1] + a_2 y[n-2] + \cdots + a_p y[n-p] + b_0 x[n] + b_1 x[n-1] + \cdots + b_q x[n-q]$. Importantly, this class is “closed” in that if you connect two such systems together (the output of one connects to the input of the other) then the resulting system can again be written in the same way (but generally with larger values of $p,q$). Another important feature of this class of systems is that they can be readily implemented in hardware.

Applying the Z-transform to such a system shows that the impulse response in the Z-domain is a rational function. Note that the product of two rational functions is again a rational function, demonstrating that this class of systems is closed under composition, as stated above. Most of the time, it suffices to work with rational impulse responses.

If $H(z) = \frac{f(z)}{g(z)}$ is rational (and in reduced form — no common factors) then the zeros of $H(z)$ are the solutions of the polynomial equation $f(z)=0$ while the poles are the solutions of the polynomial equation $g(z)=0$. [More generally, if $H(z)$ is analytic, then the zeros are the solutions of $H(z)=0$ and the poles are the solutions of $\frac1{H(z)} = 0$.] Note that a pole of $H(z)$ is a zero of its inverse: if we invert a system, poles become zeros and zeros become poles.

Poles are important because poles are what determine the regions of convergence and hence they determine when our manipulations in Step 3 are valid. This manifests itself in poles having a physical meaning: as we will see, the closer a pole gets to the unit circle, the less stable a system becomes.

Real-world systems are causal: the output cannot depend on future input. The impulse response $h[n]$ of a causal system is zero for $n < 0$. Therefore, the formal Laurent series (Step 1) representation of $h[n]$ has no negative powers of $t$. Its region of convergence will therefore be of the form $|t| < R$ for some $R > 0$. (If $R=0$ then the system would blow up!) Since $z = t^{-1}$, the ROC of a causal transfer function $H(z)$ will therefore be of the form $|z| > R$ for some $R < \infty$. If $H(z)$ is rational then it has only a finite number of poles, and it suffices to choose $R$ to be the largest of the magnitudes of the poles.

Let’s look at an unstable system: $y[n] = 2 y[n-1] + x[n]$. This system is clearly unstable because its impulse response is readily seen to be $1, 2, 4, 8, 16, 32, \cdots$. We put in a bounded signal ($x[0]=1$ and $x[n] = 0$ for $|n| \geq 1$) yet obtain an unbounded output ($y[n] = 2^n$ for $n \geq 0$). Taking the Z-transform gives $Y(z) = 2 z^{-1} Y(z) + X(z)$, so that $H(z) = Y(z)/X(z) = \frac1{1-2z^{-1}}$. For $|z|>2$, this gives us the correct series representation $\frac1{1-2z^{-1}} = \sum_{n=0}^\infty 2^n z^{-n}$.

If we put a signal that starts at time zero (i.e., $x[n] = 0$ for $n < 0$) into a causal system we will get an output even if the system is unstable. This is because the input $X(z)$ will have a ROC of the form $|z| > R_X$ and $H(z)$ has a ROC of the form $|z| > R_H$, so $Y(z) = H(z)X(z)$ will have a ROC $|z| > \max\{R_X,R_H\}$.

If we put a signal that started at time $-\infty$ into our system, then there might be no solution! (Intuitively, it means the system will have to have blown up before reaching time 0.) For example, $X(z) = \frac1{1-z}$ with ROC $|z| < 1$ represents the signal $x[n] = 1$ for $n \leq 0$ and $x[n]=0$ for $n > 0$. We cannot form the product $Y(z) = H(z) X(z)$ because there is no value of $z$ for which both $H(z)$ and $X(z)$ are valid: one’s ROC is $|z|>2$ while the other’s is $|z| < 1$.

There are many different definitions of a sequence being bounded. Three examples are: 1) there exists an $M$ such that, for all $n$, $|x[n]| < M$; 2) $\sum_{n=-\infty}^\infty | x[n] | < \infty$; and 3) $\sum_{n=-\infty}^{\infty} | x[n] |^2 < \infty$. Out of these three, the easiest to detect whether $x[n]$ is bounded given only $X(z)$ is the second one: if the ROC of $X(z)$ includes $|z|=1$ then, by definition (see a complex analysis textbook), $\sum_{n=-\infty}^\infty x[n] z^{-n}$ is absolutely convergent for $|z|=1$, meaning $\sum_{n=-\infty}^\infty |x[n] z^{-n}| < \infty$, hence $\sum_{n=-\infty}^\infty |x[n]| < \infty$ because $|z|=1$ implies $|z^{-n}| = 1$ for all $n$. This type of boundedness is called “bounded in $l^1$“. For the reverse direction, recall that the radius of convergence $R$ is such that a power series in $t$ will converge for all $|t| < R$ and diverge for all $|t| > R$. Therefore, if the boundary of the largest possible ROC of $X(z)$ does not contain the unit circle then $x[n]$ must be unbounded in $l^1$. (The boundary case is inconclusive: the power series $x(t) = \sum_{n=1}^\infty \frac1{n^2} t^n$ has a ROC $|t| < 1$ yet $x[n] = \frac1{n^2}$ is bounded. On the other hand, $x(t) = \sum_{n=0}^\infty t^n$ has a ROC $|t| <1$ and $x[n] = 1$ is unbounded.)

A sequence $x[n]$ is bounded in $l^1$ if the largest possible ROC of $X(z)$ includes the unit circle $|z|=1$. A sequence $x[n]$ is unbounded in $l^1$ if the closure of the largest possible ROC does not include the unit circle.

If the ROC of the transfer function $H(z)$ includes the unit circle, then an $l^1$-bounded input will produce an $l^1$-bounded output. Indeed, the ROC of both $H(z)$ and $X(z)$ will include the unit circle, therefore, the ROC of $Y(z)$ will include the unit circle and $y[n]$ will be $l^1$-bounded.

If all the poles of a causal $H(z)$ are within the unit circle then the ROC will include the unit circle, hence a bounded input will produce a bounded output. Engineers therefore call a system stable if all the poles of $H(z)$ are inside the unit circle.

Note that it follows from earlier remarks that if $H(z)$ has a pole outside the unit circle then the impulse response will be unbounded. If $H(z)$ has a pole on the unit circle then the impulse response will resonant at the corresponding frequency indefinitely. For example, if $H(z) = \frac1{1-az^{-1}}$ where $|a|=1$ then $h[n] = a^n$ for $n \geq 0$. This does not die away because $|h[n]| = |a^n| = 1$ for all $n \geq 0$. By writing $a = e^{\jmath \omega}$ we get $h[n] = e^{\jmath \omega n}$ and therefore we think of this as a resonance at frequency $\omega$. Furthermore, if $a = r e^{\jmath \omega}$ where $|r| < 1$, then $h[n] = a^n = r^n e^{\jmath \omega n}$ and we see that there is a damped resonance at frequency $\omega$. As the pole gets closer to the unit circle (that is, $r \rightarrow 1$) the impulse response takes longer to die out. Engineers care about pole placement!

## Epilogue

• Although often engineers can get away with just following recipes, it is safer in the long run and easier to remember the recipes if you have spent a bit more time to learn the underlying rationale for when and why the rules work.
• Linear algebra and complex analysis are considered core subjects of an undergraduate mathematics degree because they have such broad applicability: here we saw them play integral roles in discrete-time signal processing.
• This is because the “mathematical behaviour” of Laurent series matches the “physical behaviour” of discrete-time linear time-invariant systems.
• There is a close relationship between the Z-transform and the Fourier transform: just make the substitution $z = e^{\jmath \omega}$. This is only allowed if the ROC contains the unit circle. The concept of a pole does not explicitly appear in the Fourier domain because, rather than permit $z$ to be anywhere in the complex plane, the substitution $z = e^{\jmath \omega}$ means we are only looking at the behaviour of $H(z)$ on the unit circle.
• When it comes to system design, understanding where the poles and zeros are can be more productive than trying to understand the whole frequency spectrum.

## A Snippet of Galois Theory

August 11, 2016 1 comment

The following hints at why the quintic equation cannot be solved using radicals. It follows the approach in the first part of Ian Stewart’s book “Galois Theory”. If time permits, a future post will summarise the approach in V. B. Alekseev’s book “Abel’s Theorem in Problems and Solutions”. Another candidate is Klein’s book “Lectures on the Icosahedron and the Solution of Equations of the Fifth Degree”.

To set the stage, consider the generic quadratic polynomial equation $x^2 + ax + b = 0$. Solving it is the same as factorising it: given the coefficients $a,b$ find the roots $t_1, t_2$ such that $x^2 + ax + b = (x-t_1)(x-t_2)$.

The first observation is the prominent role played by symmetric polynomials. Note $(x-t_1)(x-t_2) = x^2 - (t_1+t_2) x + t_1 t_2$. Here, $s_1 = t_1+t_2$ and $s_2 = t_1 t_2$ are called elementary symmetric polynomials. They are symmetric because swapping $t_1$ and $t_2$ does not change the product $(x-t_1)(x-t_2)$. This “annoying” ambiguity turns out to be a friend; the presence of symmetry forces certain relationships to be true, as will be seen presently.

Solving a quadratic equation is equivalent to the following. Given the values of the symmetric polynomials $s_1 = t_1 + t_2$ and $s_2 = t_1 t_2$ (up to sign, these are equivalent to the coefficients $a$ and $b$ in $x^2 + ax + b$), find the roots $t_1, t_2$.

Given $s_1,s_2$, we can readily evaluate any symmetric function of the roots that we wish; this is a property of elementary symmetric polynomials that is not too difficult to prove. For example, since $t_1^2 + t_2^2$ is symmetric, it must be capable of being written as a polynomial in $s_1,s_2$. Trial and error leads to finding that $s_1^2 - 2 s_2 = (t_1+t_2)^2 - 2 t_1 t_2 = t_1^2 + t_2^2$, verifying the claim in this instance.

The first insight is that this is not an all or nothing affair. Although $t_1 - t_2$ is not symmetric, it still behaves nicely when $t_1,t_2$ are swapped: $t_2 - t_1 = -(t_1 - t_2)$. [If we were trying to solve cubic or quartic equations, we would look for polynomial expressions of the roots that took on only two or three distinct values as the roots were permuted.] Moreover, since $(-1)^2 = 1$, this implies $(t_1-t_2)^2$ is symmetric, a fact that is also self-evident. Therefore, although we cannot express $t_1 - t_2$ as a polynomial function of $s_1,s_2$, we can do the next best thing: write $t_1-t_2$ as the square root of a polynomial in $s_1,s_2$.

Specifically, $(t_1-t_2)^2 = t_1^2 - 2 t_1 t_2 + t_2^2$, and from earlier, we saw that $t_1^2 + t_2^2 = s_1^2 - 2 s_2$, therefore $(t_1-t_2)^2 = s_1^2 - 4 s_2$.

At this point, the reader should pause to recognise that we have “rediscovered” the discriminant $b^2 - 4 a c$ of a quadratic $a x^2 + bx + c$. (In our case, $a = 1$, $b = -s_1$ and $c = s_2$, hence $b^2 - 4 a c = s_1^2 - 4 s_2$.)

It is now straightforward to derive the well-known formula for solving a general quadratic equation since we have reduced it to solving linear equations: we know that $t_1 + t_2 = s_1$ and $t_1 - t_2 = \sqrt{ s_1^2 - 4 s_2 }$.

Why is it impossible to derive a similar type of formula for solving a quintic equation?

It comes down to a property of the symmetry group of five elements: it behaves differently from the symmetry groups on two, three or four elements, which is why formulae exist for solving polynomials of degree less than 5 but not for those of degree 5. (Generic polynomials of higher degrees also have no closed-form solutions expressible in terms of radicals. Note that this does not mean a formula cannot be given whatsoever; for example, degree 5 polynomials can be solved using elliptic functions.)

The remainder of this note endeavours to shed some light on why there is this remarkable connection between symmetry groups and solutions of polynomial equations.

Solving the generic quintic equation amounts to inverting the map $(t_1,\cdots,t_5) \mapsto (s_1,\cdots,s_5)$ where $s_1 = t_1 + t_2 + t_3 + t_4 + t_5,\ \cdots\ , s_5 = t_1 t_2 t_3 t_4 t_5$ are the elementary symmetric polynomials. While it is possible to show that no formula exists for this inverse that uses only ordinary arithmetic operations (addition, subtraction, multiplication and division) and radicals (taking square roots, cube roots etc), we will prove a weaker but nonetheless insightful result. In his book, Stewart refers to this as showing no solution “by Ruffini radicals” exists. (Ruffini proved this result, without realising the stronger result does not follow from it.)

The type of formulae we will permit are of the same form as that used to solve the quadratic equation. We start off with known values for $t_1+\cdots+t_5,\ \cdots\ , t_1 \cdots t_5$; indeed, these are given by $s_1,\cdots,s_5$ which are the coefficients of the quintic equation under consideration. At each step, we wish to determine more information about the roots $t_i$ by finding a rational function of the $t_i$ whose nth power is a rational function of the already known quantities, for some integer $n \geq 2$. At the first step, this means finding a rational function of the $t_i$ whose nth power is equal to a rational function of $s_1,\cdots,s_5$. Taking an nth root then gives us a new quantity that can be used in subsequent steps, and so forth. (Ruffini’s oversight was not realising that a priori it might be useful to compute nth roots of other quantities; the above description only permits an nth root to be computed if the answer is expressible as a rational function of the roots. For example,  we are not allowed to take the nth root of $s_1$ because the nth root of $s_1$ cannot be written as a rational function of the roots $t_1,\cdots,t_5$. Abel was able to prove that computing nth roots of other quantities did not help, thereby deducing the stronger result from Ruffini’s result.)

Consider step 1. A rational function $g(t_1,\cdots,t_5)$ is sought such that, for some integer $n \geq 2$, the $n$th power of $g$ is a rational function of the elementary symmetric polynomials $s_1,\cdots,s_5$, that is, $g^n(t_1,\cdots,t_5) = h(s_1,\cdots,s_5)$ where $g,h$ are rational. Without loss of generality, we may assume $n$ is a prime number. (If we wanted a sixth root, we could first take a cube root then, in step 2, take a square root.)

To emphasise the dependence on the roots, we have $g^n(t_1,\cdots,t_5) = f(t_1+t_2+t_3+t_4+t_5,\cdots,t_1 t_2 t_3 t_4 t_5)$. Can $f,g$ be chosen relatively freely, or does this constraint impose severe restrictions on the possible choices of the rational functions $f,g$?

Here comes the crux of the matter. Despite the seemingly unwieldy equation $g^n(t_1,\cdots,t_5) = f(t_1+t_2+t_3+t_4+t_5,\cdots,t_1 t_2 t_3 t_4 t_5)$, the symmetry of the right-hand side (irrespective of what $f$ actually is) forces $g$ and $n$ to be essentially unique. Permuting the roots does not change the right-hand side, therefore permuting the roots cannot change the left-hand side either. It might change $g(t_1,\cdots,t_5)$, but it will not change $g^n(t_1,\cdots,t_5)$. [Actually, if it did not change $g(t_1,\cdots,t_5)$ then $g$ would be directly expressible as a rational function of the $s_1,\cdots,s_5$ and hence we can ignore this case, which would correspond to $n=1$.]

Recall momentarily the quadratic case, where $g(t_1,t_2) = t_1 - t_2$. Swapping $t_1,t_2$ caused the sign of $g$ to change, but that was it. Here we have, for example, that $g^n(t_1,t_2,t_3,t_4,t_5) = g^n(t_2,t_1,t_3,t_4,t_5)$, therefore, $\left(\frac{g(t_2,t_1,t_3,t_4,t_5)}{g(t_1,t_2,t_3,t_4,t_5)}\right)^n = 1$, or in other words, there exists an nth root of unity $\omega$ such that $g(t_2,t_1,t_3,t_4,t_5) = \omega\,g(t_1,t_2,t_3,t_4,t_5)$. [To be clear, $\omega$ is a complex number satisfying $\omega^n = 1$.]

Different permutations of the roots will lead to possibly different values of $\omega$. Sure, some $\omega$ might equal 1, but there will be at least one permutation such that the corresponding $\omega$ does not equal one. (If this were not true then $g$ would be symmetric and hence directly expressible as a rational function of the $s_1,\cdots,s_5$, hence we could ignore this step as no radical would need computing.)

The mapping from the permutation of the roots to the value of $\omega$ is a (nontrivial) group homomorphism. It cannot be arbitrary, and in particular, we will see that it forces $n$ to equal 2.

Denote the set of all permutations of $t_1,\cdots,t_5$ by $S_5$. This set naturally has a group structure: multiplication corresponds to applying one permutation after another, so to speak. Precisely, it is a symmetric group.

Denote by $W_n$ the set of all $n$th roots of unity; $\omega$ must lie in $W_n$. This set also has a standard group structure given by ordinary multiplication: if you multiply two $n$th roots of unity you get another $n$th root of unity. In fact, it is what is known as a cyclic group (and is isomorphic to $Z/nZ$, the abelian group of integers modulo $n$).

As mentioned above, the fact that $g^n(t_1,\cdots,t_5)$ is symmetric induces a function, call it $\phi$, from $S_5$ to $W_n$. Precisely, if $\sigma \in S_5$ is a permutation of the $t_1,\cdots,t_5$ then $\phi(\sigma) = \frac{g(\sigma(t_1,\cdots,t_5))}{g(t_1,\cdots,t_5)}$; recall from earlier that this ratio will be an element of $W_n$. Crucially, $\phi$ has structure: $\phi(\sigma_1 \sigma_2) = \phi(\sigma_1) \phi(\sigma_2)$ because $\frac{g(\sigma_1(\sigma_2(t_1,\cdots,t_5)))}{g(t_1,\cdots,t_5)} = \frac{g(\sigma_1(\sigma_2(t_1,\cdots,t_5)))}{g(\sigma_2(t_1,\cdots,t_5))} \frac{g(\sigma_2(t_1,\cdots,t_5))}{g(t_1,\cdots,t_5)}$. That is, $\phi$ is compatible with the group structure and hence is a group homomorphism. Furthermore, from earlier observations, we know $\phi$ is nontrivial: there exists at least one permutation $\sigma$ such that $\phi(\sigma) \neq 1$.

There is only a single possible nontrivial group homomorphism from $S_5$ to a cyclic group of prime order!

While $S_5$ is not a very small group (it has 120 elements), its group structure is what prevents more such group homomorphisms from existing. [Since $\phi \colon S_5 \rightarrow W_n$ is a nontrivial homomorphism then the kernel of $\phi$ must be a proper normal subgroup of $S_5$. This proper normal subgroup must also contain every commutator because $W_n$ is abelian. It can be shown the only such subgroup is what is known as the alternating subgroup $A_5$, implying $n=2$.]

At this juncture, the group structure of $S_5$ has forced $g$ to be (up to a constant) the square-root of the discriminant: $g(t_1,\cdots,t_5) = \prod_{i < j} (t_i - t_j)$. [I have omitted the proof of this as it is not the main focus; I mentioned it only because it helps keep the outline as concrete as possible, and indeed, it follows closely the situation encountered at the start when the quadratic polynomial was being considered.] Going into Step 2, we know not only the values of the elementary symmetric polynomials $t_1+t_2+t_3+t_4+t_5,\ \cdots\ , t_1t_2t_3t_4t_5$ but also the value of $\prod_{i < j} (t_i - t_j)$. It turns out though that we get blocked at Step 2 from going any further. The reasoning is similar to above, but with $S_5$ replaced by a smaller group.

Precisely, at Step 2, we want to find a rational function $g$ and a prime number $n \geq 2$ such that $g^n(t_1,\cdots,t_5)$ is equal to a rational function of not only the $s_1,\cdots,s_n$ but also the new quantity we found in Step 1, namely $\prod_{i < j} (t_i - t_j)$. This time it is no longer true that $g^n(t_1,\cdots,t_5)$ must be symmetric in $t_1,\cdots,t_5$ because the new quantity $\prod_{i < j} (t_i - t_j)$ is not symmetric. However, if we only look at what are known as even permutations then $\prod_{i < j} (t_i - t_j)$ will remain invariant. The alternating subgroup $A_5$ of $S_5$ consists precisely of the even permutations. In particular, we therefore know that if $\sigma$ is in $A_5$ then $g^n(\sigma(t_1,\cdots,t_5)) = g^n(t_1,\cdots,t_5)$. This is the same as in Step 1 except $S_5$ has been replaced by $A_5$. [We did not need to know that Step 1 produced $\prod_{i < j} (t_i - t_j)$. A simpler group-theoretic argument mentioned parenthetically above implies that the kernel of $\phi$ in Step 1 is $A_5$ and hence that whatever new function of the roots is determined in Step 1, it will be invariant with respect to permutations belonging to $A_5$.]

Exactly as in Step 1, the invariance of $g^n(t_1,\cdots,t_5)$ forces the existence of a nontrivial group homomorphism, this time from $A_5$ to a cyclic group of order $n$. And once again, the group structure of $A_5$ limits the possibilities.

There are no nontrivial group homomorphisms from $A_5$ to a cyclic group of prime order.

The above (which is a relatively straightforward result in group theory) immediately implies that there is no formula of the kind specified earlier for solving a quintic polynomial.

Galois theory goes well beyond the basic argument sketched here. It can be used to conclude that a generic quintic (or any higher-degree polynomial for that matter) has no solution in terms of radicals (rather than use Abel’s result to strengthen Ruffini’s result, as alluded to earlier, a cleaner proof is possible). It can also be used to conclude that there exist particular quintic polynomials whose roots cannot be expressed using radicals; this does not follow from the generic case because the generic case sought a single formula for all quintic polynomials, whereas a priori there might exist a different formula for each quintic equation. (Certainly, some quintic equations can be solved using radicals.)

## The Cameron-Martin Space, and Some Differences Between Finite and Infinite Dimensional Spaces

The usefulness of infinite-dimensional vector spaces is that often concepts in finite dimensions extend to infinite dimensions; this allows us to reason about infinite-dimensional problems using finite-dimensional intuition. (This is explained in detail in my primer on RKHS theory, for example.) Nevertheless, infinite-dimensional spaces exhibit behaviours not found in finite dimensions, and working effectively with infinite-dimensional spaces requires understanding how they differ from finite dimensional spaces.

This article discusses several key differences between a Gaussian measure on Euclidean space $\mathbb{R}^n$ and on the separable Hilbert space $l^2$, which can be thought of as an infinite-dimensional generalisation of Euclidean space. (Recall $l^2$ consists of all square-summable sequences.)

## A Gaussian Measure on $l^2$

Let $e_i$ denote the canonical basis vectors of $l^2$, that is, $e_i$ is the sequence of all zeros except for the $i$th term which is unity.

Essentially, we wish to construct a Gaussian random variable $X$ on $l^2$ element-wise by first generating a collection of real-valued independent random variables $X_i \sim N(0,\lambda_i)$ and then setting $X = (X_1,X_2,\cdots)$, or in other words, $X = \sum X_i e_i$. We assume $\lambda_i > 0$.

It is essential for $\lambda_i$ to decay to zero sufficiently fast, for otherwise, the sequence $X$ will have a non-zero probability of not lying in $l^2$.  We require $\sum X_i^2 < \infty$ to hold with probability one.  Since the standard deviation $\sqrt{\lambda_i}$ gives the “typical” size of a random realisation of $N(0,\lambda_i)$, it is not surprising that the requirement be $\sum \lambda_i < \infty$.  (Throughout, curious readers are invited to consult an introductory book on infinite-dimensional analysis for more details. For example, in general, one starts with a linear operator $Q$ that will serve as the covariance matrix, and requires $Q$ to be of trace class.  Here, I am simply taking $Q$ to be the “diagonal matrix” $\mathrm{diag}(\lambda_1,\lambda_2,\cdots)$.)

It turns out that the above procedure can be made rigorous; provided $\sum \lambda_i < \infty$, there is indeed a Gaussian random variable $X$ on $l^2$ such that the $X_i = \langle X, e_i \rangle$ are independent Gaussian random variables $N(0,\lambda_i)$.

To any (Borel-measurable) subset $A \subset l^2$, we define the Gaussian measure $\mu(A)$ as the probability that $X$ lies in $A$.  (To a mathematician, this is putting the cart before the horse, but nevertheless, it is a convenient way to think when it comes to having to evaluate $\mu(A)$ in certain situations.)

Since we insisted all the $\lambda_i$ be positive, the measure $\mu$ is non-zero on any open subset of $l^2$. Note too that $\mu(l^2) = 1$.

## Some Possibly Surprising Facts

Given that the following do not hold in finite dimensions, they are initially counter-intuitive. The remainder of the article aims to give sufficient explanations to “improve” our intuition, thereby making the following facts unsurprising.

• Given a Gaussian measure $\mu$ on $l^2$, there exists a subset $A \subset l^2$ and an element $v \in l^2$ such that
1. $A$ and $A+v$ are disjoint;
2. $\mu(A) = 1$; and
3. $\mu(A+v) = 0$.

Why should these facts be surprising? By analogy with the finite-dimensional case, one would expect that unless $A$ is pretty much all of $l^2$ then it would not be possible for $\mu(A) = 1$. Indeed, $\mu$ is non-zero on any open subset of $l^2$, hence must $A$ be dense in $l^2$? So how can the translated version $A+v$ be disjoint from $A$? And how can the “large” set $A$, having measure 1, go to the “small” set $A+v$ having measure 0, simply by a translation?

## Explanation

For concreteness, choose $\lambda_i = \frac1{i^3}$.

The subset $A$ will be taken to be the limit of a monotonically increasing sequence of subsets $A^{(n)}$. The subsets $A^{(n)}$ will be taken to be “rectangles” of the form $A^{(n)} = \{ (x_1,x_2,\cdots) \in l^2 \mid -a_i^{(n)} < x_i < a_i^{(n)}\}$ where the $a_i$ will be strictly positive.

Since the variance of the $X_i$ goes to zero, we hope to be able to choose the $a_i^{(n)}$ so that they decay to zero in $i$, for fixed $n$, while ensuring $A^{(n)}$ has measure close to unity. This gives us a chance of constructing an $A$ which is not the whole of $l^2$ yet has measure equal to unity. The rate of decay $i^{-3/2}$ is too fast because the probability that $X_i \sim N(0,i^{-3})$ lies between $-i^{-3/2}$ and $i^{-3/2}$ does not depend on $i$; if this probability is $c$ then $c^\infty = 0$ would be the measure of the rectangle. This motivates trying a slower rate of decay: $a_i^{(n)} = \sqrt{2}\frac{n}{i}$. (The numerator $n$ is there to grow the rectangles so the probability of $X$ lying in $A^{(n)}$ goes to unity as $n \rightarrow \infty$, and the $\sqrt{2}$ is for later convenience.)

The probability that $X_i \sim N(0,i^{-3})$ lies between $\pm \sqrt{2}\frac{n}{i}$ is conveniently expressed in terms of the error function as $\mathrm{erf}(n\sqrt{i})$. Hopefully, $\mu(A^{(n)}) = \prod_{i=1}^\infty \mathrm{erf}(n\sqrt{i}) > 0$ for all $n > 0$, and $\prod_{i=1}^\infty \mathrm{erf}(n\sqrt{i}) \rightarrow 1$ as $n \rightarrow \infty$, so that $\mu(A)=1$. This is indeed the case.

[Here is the tedious calculation to prove the claims. Let $c_i = 1-\mathrm{erf}(n \sqrt{i})$. A well-known bound on the complementary error function is $1-\mathrm{erf}(u) = \mathrm{erfc}(u) < \frac1{\sqrt{\pi}u} e^{-u^2}$. Therefore, $c_i < d_i$ where $d_i = \frac1{n\sqrt{\pi i}} e^{-in^2}$. Note $0 < c_i < 1$ and $0 < d_i < 1$ when $n \geq 1$ and $i \geq 1$. Provided $\sum \ln(1-c_i)$ is finite, $\mu(A^{(n)}) = \prod (1-c_i)$ will be strictly positive, as required. Now, $0 > \ln(1-c_i) > \ln(1-d_i) \geq \frac{-d_i}{1-d_i}$, hence it suffices to prove $\sum\frac{d_i}{1-d_i}$ is finite. The ratio test for absolute convergence involves the ratio $\frac{d_{i+1}}{1-d_{i+1}} \frac{1-d_i}{d_i} = \frac{d_{i+1}}{d_i} (1-d_i) \frac1{1-d_{i+1}}$. Since $d_i \rightarrow 0$, it suffices to show $\lim_{i \rightarrow \infty} \frac{d_{i+1}}{d_i} < 1$ in order to conclude that $\sum\frac{d_i}{1-d_i}$ is finite. Now, $\frac{d_{i+1}}{d_i} = \sqrt{\frac{i}{i+1}} \frac{e^{-(i+1)n^2}}{e^{-in^2}} \rightarrow e^{-n^2} < 1$ whenever $n > 0$, as required. To show $\mu(A) = 1$, we need to show $\lim_{n \rightarrow \infty} \sum_i \frac{d_i}{1-d_i} = 0$.  An earlier calculation shows $d_{i+1} < e^{-n^2} d_i$, so that $\sum_i \frac{d_i}{1-d_i} \leq \frac{d_1}{1-d_1}(1+\alpha+\alpha^2+\cdots) = \frac{d_1}{1-d_1} \frac1{1-\alpha}$ where $\alpha = e^{-n^2}$.  It is clear that this can be made arbitrarily close to zero by choosing $n$ large.]

• Had we been in finite dimensions, the sets $A^{(n)}$ would have been open. Here, they are not open.

Since the $a_i^{(n)} \rightarrow 0$ as $i \rightarrow \infty$, there is no open ball centred at the origin, with positive radius $\rho$, that is contained in $A^{(n)}$.  Indeed, for a given radius $\rho > 0$ and $n$, an $i$ can be found such that $a_i^{(n)} < \rho$, showing $A^{(n)}$ is not fully contained within the ball.

• Let $v_i = i^{-3/4}$. The point $v = (v_1,v_2,\cdots) \in l^2$ is not in $A$.

We may be led to believe this is surprising: as $n$ gets bigger, all the sides of the rectangle $A^{(n)}$ get bigger, hence we may expect that it will grow to be the whole of $l^2$. However, as explained presently, in infinite dimensions, the order in which limits are taken matters, and the above argument is invalid.  It will be seen that while the sides of the rectangle do grow, they grow too slowly.

On the one hand, it is true that if we fix an $i$, then we can find an $n$ sufficiently large so that the $i$th side of the rectangle $A^{(n)}$ contains $v_i$, the $i$th term of $v$.  Mathematically, this is true because $\lim_{n \rightarrow \infty} a_i^{(n)} = \infty$.  However, this is only really telling us that the “truncated” approximations $(v_1,0,\cdots), (v_1,v_2,0,\cdots), (v_1,v_2,v_3,0,\cdots),\cdots$ are all in $A$, but analytically, we know that if $A$ is not closed then the limit of these approximations need not lie in $A$. Figuratively speaking, even though an ordinary towel can be stuffed in a suitcase by sequentially pushing in the bits that are hanging out, this reasoning is not valid if the towel were infinitely big; after each push, there may still be an infinite portion of the towel hanging out. Instead, we must think directly of the $A^{(n)}$ as suitcases of increasing size, and ask if the towel fits entirely inside one of these suitcases.

The reason why $v$ does not lie in any of the $A^{(n)}$, and hence $v$ is not in $A$, is that the terms $v_i$ of $v$ decrease more slowly to zero than the sides $a_i^{(n)}$ of the rectangles.  For a fixed $n$, there will be a sufficiently large $i$ so that $v_i > a_i^{(n)}$, thereby showing $v \not\in A^{(n)}$ for all $n$.

• But how can $v$ not be in $A$ if every dimension of the suitcase $A^{(n)}$ is expanding with $n$?

The sides of the suitcase are aligned with the canonical basis vectors $e_i$.  However, the $e_i$ form a Schauder basis, not a Hamel basis, so it is not true that “every dimension of the suitcase is expanding”.  For example, the point $v$ does not lie in the span of the $e_i$.  (The span of the $e_i$ consists of all sequences with only a finite number of non-zero terms.) The argument given earlier readily extends to show that, for all $\lambda \neq 0$, the point $\lambda v$ does not lie in any of the suitcases $A^{(n)}$. So while the suitcases are getting bigger in the directions $e_i$, they always have zero width in the direction $v$.  (It is true that the distance from $A^{(n)}$ to $v$ goes to zero as $n \rightarrow \infty$, but that just means $v$ is in the closure of $A$.)

• $\mu(A+v) = 0$.

This is another phenomenon unique to the infinite-dimensional case. Recall that $X_i \sim N(0,i^{-3})$. The $i$th side of $A^{(n)} + v$ is the interval from $v_i - a_i^{(n)}$ to $v_i + a_i^{(n)}$. We know $\frac{v_i}{a_i^{(n)}} \rightarrow \infty$ as $i \rightarrow \infty$ (the sides of the suitcase decrease much faster than the $v_i$). And $v_i$ normalised by the standard deviation of $X_i$ also goes to infinity, i.e., $\frac{v_i}{i^{-3/2}} = i^{3/4} \rightarrow \infty$. So the probability that $X_i$ lies inside the interval from $v_i - a_i^{(n)}$ to $v_i + a_i^{(n)}$ goes to zero. While the finite product of positive numbers is positive, the infinite product of positive numbers that decay to zero is zero: there is zero probability that a randomly chosen point will lie in $A^{(n)} + v$.

• The sets $A$ and $A +v$ are disjoint.

Again, this is a consequence of the order of the limits.  For a fixed $n$, the $i$th edge of the suitcase $A^{(n)}$ decays to zero faster than $v_i$.  So for $i$ sufficiently large, the $i$th edges of $A^{(n)}$ and $A^{(n)}+v$ do not overlap, hence $A$ and $A +v$ are disjoint.

## Epilogue

While the above gave one specific example, the general theory goes as follows. Given a Gaussian measure $N_{0,Q}$ on $l^2$ with mean zero and covariance $Q$, the shifted Gaussian measure $N_{v,Q}$ is either equivalent to $N_{0,Q}$ (meaning the two measures are absolutely continuous with respect to each other), or the two measures are singular. The Cameron-Martin space is the space of all $v$ for which the two measures are equivalent. There is a simple expression for this space: it is $Q^{1/2}(l^2)$, the image of $l^2$ under the operator $Q^{1/2}$.

## An Introduction to Normal and Non-normal Subgroups

We currently are running several reading groups for engineering students and one is on Lie Groups and Representation Theory.  When elementary group theory was introduced yesterday, not surprisingly, there were many questions concerning normal subgroups.  Like many areas of mathematics, the traditional learning paradigm is to “just memorise the definitions and later things will make sense”.  While it is true that later things will make sense — once a critical mass of ideas and definitions has been accumulated, the connections between them can be seen, and the structure becomes self-supporting — we should still strive to make things as intuitive as possible right from the start.  Below is one such attempt.  It tries to introduce group theory based around several examples, one example at a time. Particular emphasis is placed on explaining normal and non-normal subgroups. Key points will be written in bold. (Caveat: The below captures what first came to mind and is likely to be unduly verbose.)

## Free Group with Two Generators

Group theory came from the study of “transformations” of a space, and often, it is useful to think of each element of a group as something which “acts on” or “transforms” something. Later, we will consider permutation groups, which precisely fit this description. (In the reading group, we consider rotation groups, which also fit this description precisely.) Here, we consider a different group first, giving two equivalent definitions of it.

The algebraic definition is that our group $G$ consists of all “words” that can be written using the letters $a$, $b$, $a^{-1}$ and $b^{-1}$, with the cancellation rule that whenever $a$ and $a^{-1}$ appear next to each other, they can be omitted from the word, and the same for $b$ and $b^{-1}$. Besides the “empty word”, denoted $e$, the group $G$ contains elements such as $aaaba$, $ababa$, $ab^{-1}b^{-1}aba^{-1}$ and so forth. The group operation (often written as multiplication) is just concatenation, so that $ab$ times $aabb$ is $abaabb$.

Since $abb^{-1}a$ is the same as $aa$ by definition of our cancellation rule (later, a physical interpretation will be given), note that $ab$ times $b^{-1}a$ equals $aa$.

The axioms of a group require the group operation to be associative, and it is clear that concatenation is associative, i.e., the order of doing the multiplications does not matter: ( (word one) times (word two)) times (word three) gives the same as (word one) times ((word two) times (word three)).  For example, $((ab) \cdot (ab)) \cdot (ab) = ababab = (ab) \cdot ((ab) \cdot (ab))$.

There must also be an identity element. In this case, this is the empty word $e$, and it is clear that adding a “blank” to the start or the end of a word leaves the word unchanged, hence the empty word really is the identity element. (Time precludes me from writing down the group axioms and verifying them in detail.)

Lastly, to verify $G$ is a group, it must be shown that every element has an inverse. This is intuitively clear; for example, the inverse of $ab^{-1}ab$ is $b^{-1}a^{-1}ba^{-1}$ because $ab^{-1}ab \cdot b^{-1}a^{-1}ba^{-1} = e$. This example shows the inverse can be constructed one letter at a time, going from right to left, writing down the inverse of each letter. (Note that the blank word also results if the inverse had been multiplied on the left rather than the right: $b^{-1} a^{-1} b a^{-1} \cdot a b^{-1} a b = e$. If this were not true, then $G$ would not be a group, i.e., the axioms require that each element have a left inverse which is also a right inverse.)

While the above algebraic definition shows that this group, known as the free group with two generators, is easy to work with algebraically, it lacks a physical interpretation. This is now remedied by thinking of the free group with two generators as the fundamental group of the plane with two holes removed; don’t worry if that does not make any sense as I will explain the concept intuitively.

Imagine the floor of a room has an X marked on it, and to the left of the X is a vertical pole sticking out of the ground, and to the right there is another pole.  A spider starts at X and can be commanded to do four things.  Command “a” will make the spider walk from X clockwise around the pole on the left and return to X, all the while leaving a thin web behind him. (If you prefer, think of a robot laying down a rope.) Command “a inverse” will make the spider leave X, walk anti-clockwise around the pole on the left then return to X.  Notice that if the spider does “a” then ”a inverse” the web can be shrunk or contracted back to X, so “a” followed by “a inverse” is the equivalent of “do nothing”, because we want to treat two web configurations as the same if one can be deformed into the other without breaking the web. (Technically, the web is allowed to pass through itself so we do not need to worry about knots forming; we simply care about what happens as the web shrinks to as short a length as possible.)  Note that “a” followed by “a” gives a configuration (denoted $aa$) that is different from just instructing the spider to do “a” (denoted $a$); the former has the web encircling the pole on the left twice while the latter has the web encircling the pole only once, and no amount of jiggling of the web will transform one configuration into the other.

Similarly, commands “b” and “b inverse” can be defined to mean walk clockwise or anti-clockwise around the pole on the right, respectively. The reader should pause to ensure the algebraic definition and this spider-web definition agree intuitively by doing several thought experiments and visualising the resulting web configuration.

A subgroup can be thought of as what results if the spider is no longer allowed to carry out arbitrary commands but can only carry out certain sequences of commands and their inverses.  For example, perhaps the spider can only carry out the command $ab$ and its inverse $b^{-1}a^{-1}$.  Then the resulting subgroup consists of $\cdots, (b^{-1}a^{-1})^3,(b^{-1}a^{-1})^2,b^{-1}a^{-1},e,ab,(ab)^2,(ab)^3,\cdots$. (This subgroup is isomorphic to the group of integers.)  Another subgroup is obtained by simply allowing the spider to carry out command “a” or its inverse.  (This subgroup is also isomorphic to the group of integers.)

Let $H$ denote a subgroup of $G$. Recall the definition of normality: $H$ is a normal subgroup of $G$ if, for every $g \in G$ and $h \in H$, the so-called conjugate $g^{-1} h g$ is in $H$.  This definition will have been motived several times before the end of this article, however, for now, we start with a high level if not somewhat vague explanation: a normal subgroup is one whose elements produce actions that do not get “tangled up” very much with all the other actions, as now discussed.

If $H$ is the subgroup generated by $a$, meaning the elements of $H$ are precisely $\cdots,a^{-3},a^{-2},a^{-1},e,a,a^2,a^3,\cdots$ then $H$ is not normal. For example, $b^{-1} a b$ cannot be written as an element of $H$ because there is no way for the actions of $a$ and $b$ to be exchanged: if first the spider is given the command to wrap a web around the right pole clockwise, then apply a command from $H$, then wrap a web around the right pole anti-clockwise, there is no way for the web around the right pole to “cancel out” and leave a web that is only wrapped around the left pole; the subgroup $H$ is not normal.

Why is normality interesting? Before giving the standard explanation (i.e., the left cosets of a subgroup form a quotient group if and only if the subgroup is normal, and essentially equivalently, a subgroup is the kernel of a group homomorphism if and only if it is normal), a high level pragmatic explanation is proffered: when we are writing down sequences of group operations, we want to be able to simplify things, so we need some way of being allowed to interchange operations. In the simplest of cases, there are so-called Abelian groups which, by definition, are commutative: $gh = hg$ for all $g$ and $h$ in the group. The simplest examples of Abelian groups are vector spaces with the usual vector addition being treated as the group operation (and scalar multiplication is forgotten about, except for multiplication by $-1$ which is the inverse group operator). In an Abelian group, $g^{-1} h g = g^{-1} g h = h$, hence every subgroup is normal. This is the ultimate in “interchangeability”.

A normal subgroup has a property similar to, but a bit weaker than, full commutativity, in the following sense. If $H$ is normal, and $h \in H$, then although $gh$ may not equal $hg$, I can achieve the next best thing: I can find a $\tilde h \in H$ such that $g h = \tilde h g$.  That is, I have been able to interchange the action $g$ with an action from $H$ provided I do not mind using a possibly different action from $H$. (Note that this follows immediately from the definition of normality: $gh = ghg^{-1}g = \tilde h g$ where $\tilde h = g h g^{-1}$ is guaranteed to be in $H$ by normality. Note that $g$ can be replaced by $g^{-1}$ in the definition of normality without changing anything.)

At this point, the reader should be comfortable in believing that being normal is a special property not shared by every subgroup, and if a subgroup is normal, then it is a useful thing, because it allows one to manipulate group operations to achieve simplifications.

## The Symmetric Group on Three Letters

Consider three distinct balls arranged in a row.  A robot can be asked to change their ordering: one action might be to swap the left-most ball with the right-most ball. The set of all distinct actions forms a group $S_3$, called the symmetric group on three letters, with group multiplication $gh$ taken to mean “first perform action $h$ then perform action $g$”, which again is a form of concatenation. Googling will bring up more than enough information about symmetric groups, so here I just wish to emphasise thinking in terms of each element corresponding to asking a robot to perform an action.

Let $H$ denote the subgroup of $S_3$ formed from the action “interchange the left-most and middle balls”.  Denote this action by $(12)$, and note that it is its own inverse. (I am using cycle notation.) This is another example of a non-normal subgroup: consider $(13)^{-1} (12) (13)$, with the notation meaning, first swap the left and right balls, then swap the left and middle balls, then swap the left and right balls.  If the balls (red, green, blue) start off in the order RGB, then applying (13) gives BGR, applying (12) gives GBR and applying $(13)^{-1} = (13)$ gives RBG.  So overall, it transformed RGB to RBG.  Note that the right ball has been changed, yet no command from $H$ can change the right ball, hence $H$ is not normal.  This again is an illustration of the vague intuition that non-normal subgroups perform actions that tangle other actions up.

Besides the empty group and the whole group, it can be shown that the only other normal subgroup of $S_3$ is the so-called alternating group $A_3$ formed from all the even permutations.

Gaining further insight requires looking more closely at applications of normality. Perhaps the most germane to look at is whether a certain way of partitioning a group into equivalence classes leads to a compatible group structure on the equivalence classes. Recall $H$ is the subgroup of $S_3$ generated by (12). It turns out that any subgroup leads to a partition of the original group: $G = \cup_{g \in G} gH$ where $gH = \{gh \mid h \in H\}$ is a left coset. Crucially, for two distinct $g,g' \in G$, either $gH = g'H$, or $gH \cap g'H = \emptyset$. (Intuitively why this is so will be explained later. For now, an algebraic proof is given: Assume there exist $h,h' \in H$ such that $gh = g'h'$. For any $f \in H$, define $f' = (g')^{-1}gf$, so that $gf = g'f'$. The result follows if we can show $f'$ belongs to $H$. Note $f' = (g')^{-1}ghh^{-1}f = h'h^{-1}f$ which is in $H$ because $h'$, $h^{-1}$ and $f$ are all in $H$.)

If we treat a vector space as a group, and take for $H$ a linear subspace, then $gH$ would correspond to the affine subspaces obtained by translations of $H$; perhaps this is the simplest way of visualising left cosets.  (Vector spaces form Abelian groups, hence all their subgroups are normal, and simply considering vector spaces as groups does not lead to insight into non-normal subgroups.)

It can be deduced fairly readily that the elements of $S_3$ are (), (12), (13), (23), (123), (132).  (There are six elements because the left ball can be placed in one of three positions, the middle ball can be placed in one of two positions, and having done so, there is only one spot remaining for the right ball, i.e., there are $1 \times 2 \times 3 = 6$ possible re-arrangements.)

The cosets $()H$ and $(12)H$ are both just $H$ because the elements of $H$ are () and (12).

The coset $(13)H$ consists of the elements (13) and (13)(12). The latter transforms RGB into BRG and hence is equivalent to (123).  So the cosets $(13)H$ and $(123)H$ are the same; they contain the elements (13) and (123) of $S_3$.

The coset $(23)H$ consists of the elements (23) and (23)(12) = (132). Hence the cosets $(23)H$ and $(132)H$ are the same; they contain the elements (23) and (132) of $S_3$.

This $H$ is therefore seen to partition $S_3$ into three sets called equivalence classes: $S_3 = \{ (), (12) \} \cup \{ (13), (123) \} \cup \{ (23), (132) \}$.

Consider multiplying two elements from different equivalence classes of the partition. For example, (13) from the second equivalence class times (23) from the third equivalence class gives (13)(23)=(132), which lies in the third equivalence class. However, (123) from the second equivalence class times (132) from the third equivalence class gives (), which lies in the first equivalence class.  Therefore, there is no consistent way of defining what it means to multiply the second equivalence class with the third equivalence class, since choosing different elements to represent each equivalence class can lead to different answers.  (This should be compared with modulo arithmetic, where things work nicely.) Since $H$ is not normal, it tangles things up too much.

What is required for the equivalence classes to have a group structure?  If $f,f',g,g' \in G$ are such that $fH = f'H$ and $gH = g'H$, that is, $g$ and $g'$ represent the same equivalence class, and the same for $f$ and $f'$, then we want $fgH$ to be the same as $f'g'H$. This must be true in the special case when $f = ()$. If $f = ()$ then $f'$ must be an element of $H$.  (If $H = f'H$ then we can find $h,h' \in H$ such that $h = f'h'$, so that $f' = h(h')^{-1} \in H$.) So a necessary condition is for $gH = f'g'H$ when $f' \in H$. Recall from earlier that if $H$ were normal then we could interchange the order in which $f'$ and $g'$ are multiplied, albeit by having to change $f'$ to a different element of $H$. Thus, if $H$ is normal then $f'g' = g'f''$ for some $f'' \in H$, hence $f'g'H = g'f''H = g'H$, as required. With a little more care, it can be shown that a necessary and sufficient condition for left cosets to have a group structure, is for the subgroup to be normal. One way to see why normality suffices is by writing $(gH)(fH) = (gff^{-1}H)(fH) = (gf)(f^{-1}Hf)H = gfH$ because, by normality, $f^{-1}Hf$ will equal $H$. (This notation requires some justification which I do not elaborate on; treat it as a mnemonic, if you prefer.)

## Why do Cosets form a Partition?

For Abelian groups, such as vector spaces, it is relatively clear why left cosets form equivalence classes. Why does this hold for arbitrary groups, when multiplication can tangle things up?

Let $H$ be a subgroup of $G$.  First note two things: if $h \in H$ then $hH = H$, and if $g \notin H$ then $gH \cap H = \emptyset$. Proving the former is left as an easy exercise. The latter holds because, if $gh$ is in $H$, and if $h$ is in $H$, then $ghh^{-1} = g$ must also be in $H$.

We can now appeal to symmetry. Groups were invented to study symmetry, and in the first instance, this means often we understand what is going on by shifting things back to the “origin”, meaning the identity element of the group. If we want to understand what $gH$ and $g'H$ look like relative to each other, then we can premultiply them by $g^{-1}$. The key point is that this “transformation” is one-to-one and preserves structure, so we are allowed to do this without altering anything important. Thus, comparing $gH$ with $g'H$ is essentially the same as comparing $H$ with $g^{-1}g'H$. But we know from earlier that either $g^{-1}g'$ is an element of $H$, in which case $H = g^{-1}g'H$ and hence $gH = g'H$, or else $g^{-1}g'$ is not an element of $H$, in which case $H$ and $g^{-1}g'H$ are disjoint, and hence $gH$ and $g'H$ are also disjoint.

## Reproducing Kernel Hilbert Spaces

Given that reproducing kernel Hilbert spaces (RKHS) behave more like Euclidean space than more general Hilbert spaces, it is somewhat surprising they are not better known. I have uploaded to arXiv a primer on reproducing kernel Hilbert spaces that differs from other introductory material in a number of ways.

It is perhaps the first to start by considering finite-dimensional spaces rather than jump straight to infinite-dimensional spaces. Surprisingly much can be gained by considering the finite-dimensional case first. It also does not assume any prior knowledge about Hilbert spaces and goes to great lengths to explain clearly about closed and non-closed subspaces and completions of spaces. (e.g., It is explained that a space which is not complete can be thought to have a ‘scratched surface’.)

Whereas other material refers to a RKHS theory as providing a “coordinate-free approach”, I argue that the presence of a Hilbert space already provides a coordinate-free approach, with the added benefit of a reproducing kernel Hilbert space being a canonical coordinate system associated to each point in the Hilbert space.

The primer explains that RKHS theory studies the configuration of an inner product space sitting inside a function space; it describes an extrinsic geometry because how the space is oriented inside the larger function space matters. This way of thinking leads to being able to determine when RKHS theory might be relevant to a problem at hand; if the solution of the problem changes continuously as the configuration changes continuously then it is possible the solution might be expressible in terms of the kernel in a nice way.

How RKHS theory can be used to solve linear equations is discussed along with some novel descriptions and extensions, as is the role of RKHS theory in signal detection.

My colleague contributed substantially by describing some of the recent work in the statistical learning theory literature, including explaining how RKHS theory can be used for testing for independence, for nonlinear filtering, and so forth.

## Tensor Products Introduced as Most General Multiplication (and as Lazy Evaluation)

July 27, 2014 2 comments

This note is a fresh attempt at introducing tensor products in a simple and intuitive way. The setting is a vector space $V$ on which we wish to define a multiplication. Importantly, we do not wish to limit ourselves to a multiplication producing an element of $V$. For example, we would like to think of $u^T v$ and $u v^T$ as two examples of multiplying elements $u,v$ of $\mathbb{R}^n$. The first multiplication, $u,v \mapsto u^T v$, is a map from $\mathbb{R}^n \times \mathbb{R}^n$ to $\mathbb{R}$. The second multiplication, $u,v \mapsto uv^T$, is a map from $\mathbb{R}^n \times \mathbb{R}^n$ to $\mathbb{R}^{n \times n}$. In fact, since $uv^T$ is well-defined even if $u$ and $v$ have different dimensions, the most general setting is the following.

Let $U$ and $V$ be fixed vector spaces agreed upon at the start. Assume someone has defined a multiplication rule $m\colon U \times V \rightarrow W$ but has not told us what it is.  Here, $m$ is a function and $W$ is a vector space, both of which are unknown to us. For example, if $U = \mathbb{R}^n$ and $V = \mathbb{R}^p$, a possible choice might be $m(u,v) = uv^T$ where $W = \mathbb{R}^{n \times p}$.

What does it mean for $m$ to be a rule for multiplication? We wish it to have certain properties characteristic of multiplication. In a field, multiplication is involved in a number of axioms, including the distributive law $x(y+z) = xy+xz$. Since $U, V$ and $W$ are vector spaces, potentially distinct from each other, not all the axioms for multiplication in a field make sense in this more general setting. (For instance, commutativity makes no sense if $U \neq V$.) The outcome is that $m$ is declared to be a rule for multiplication if it is bilinear: $m(x,y+z) = m(x,y)+m(x,z)$, $m(x+y,z) = m(x,z)+m(y,z)$ and $m(ax,y) = m(x,ay) = a\,m(x,y)$.

Not knowing $m$ does not prevent us from working with it; we simply write it as $m$ and are free to manipulate it according to the aforementioned rules. We do the equivalent all the time when we prove general results about a metric; we say “let $d(\cdot,\cdot)$ be a metric” and then make use of only the axioms of a metric, thereby ensuring our derivation is valid regardless of which metric is actually being represented by $d$.

While the tensor product $\otimes$ is more than this, in the first instance, it suffices to treat it merely as an unspecified rule for multiplication. (It is specified, but no harm comes from forgetting about this in the first instance.) Rather than write $m(u,v)$ it is more convenient to write $u \otimes v$, and thinking of $\otimes$ as multiplication reminds us that $x \otimes (y+z) = x \otimes y + x \otimes z$.

The first point is that we can treat $\otimes$ as an unspecified multiplication, simplify expressions by manipulating it in accordance with the rules for multiplication, and at the end of the day, if we are eventually told the rule $m$, we can evaluate its simplified expression. In computer science parlance, this can be thought of as lazy evaluation.

Whereas there are many metrics “incompatible” with each other, the rules for multiplication are sufficiently rigid that there exists a “most general multiplication”, and all other multiplications are special cases of it, as explained presently. The tensor product $\otimes$ represents this most general multiplication possible.

To consider a specific case first, take $U$ and $V$ to be $\mathbb{R}^2$. We already know two possible multiplications are $u^T v$ and $uv^T$. The claim is that the latter represents the most general multiplication possible. This means, among other things, that it must be possible to write $u^T v$ in terms of $uv^T$, and indeed it is: $u^T v = \mathrm{trace}(uv^T)$. Note that trace is a linear operator. The precise claim is the following. No matter what $m\colon \mathbb{R}^2 \times \mathbb{R}^2 \rightarrow W$ is, we can pretend it is $m(u,v) = uv^T$, work with this definition, then at the very end, if we are told what $m$ actually is, we can obtain the true answer by applying a particular linear transform. At the risk of belabouring the point, if we wish to simplify $m(u+v,w+x) - m(v,x) - m(u,w)$ then we can first pretend $m(u,v)$ is $uv^T$ to obtain $(u+v)(w+x)^T - vx^T - uw^T = ux^T + vw^T$. Later when we are told $m(u,v) = 2u^Tv$ we can deduce that the linear map required to convert from $uv^T$ to $2u^Tv$ is $2u^T v = 2\mathrm{trace}(uv^T)$. Applying this linear map to $ux^T + vw^T$ yields $2(x^T u + w^Tv)$ and this is indeed the correct answer because it equals $m(u+v,w+x) - m(v,x) - m(u,w)$.

Readers wishing to think further about the above example are encouraged to consider that the most general multiplication returning a real-valued number is of the form $u,v \mapsto v^T Q u$ for some matrix $Q$. How can $v^T Q u$ be obtained from $uv^T$? What about for multiplication rules that return a vector or a matrix?

The general situation works as follows. Given vector spaces $U$ and $V$, two things can be constructed, both of which use the same symbol $\otimes$ for brevity. We need a space $W$ for our most general multiplication, and we denote this by $W = U \otimes V$. We also need a rule for multiplication and we denote this by $u \otimes v$. By definition, $u \otimes v$ is an element of $U \otimes V$. (Just like not all matrices can be written as $uv^T$, there are elements in $U \otimes V$ that cannot be written as $u \otimes v$. They can, however, be written as linear combinations of such terms.)

At this point, I leave the reader to look up elsewhere the definition of a tensor product and relate it with the above.

## Simple Derivation of Neyman-Pearson Lemma for Hypothesis Testing

July 24, 2014 1 comment

This short note presents a very simple and intuitive derivation that explains why the likelihood ratio is used for hypothesis testing.

Recall the standard setup: observed is a realisation $x$ of a random variable (or vector or process) $X$ generated either with probability density $p_0(X)$ or $p_1(X)$.

The observation space is to be partitioned into two regions, one region labelled zero and the other unity. If $x$ lies in the region labelled zero then $X$ is assumed to have been generated under $p_0$. The rough aim is to choose the partition so that if this “game” is repeated many times — choose $i$ to be 0 or 1; repeatedly generate samples from $p_i$ and count how many times they fall in the two regions — then the ratio of “correct outcomes” to “incorrect outcomes” is maximised, regardless of whether $i=0$ or $i=1$ was used. On average, we wish to be correct many more times than we are wrong.

Since there is a trade-off between being correct when $i=0$ and when $i=1$, Neyman-Pearson’s underlying criterion is to fix the probability of being wrong when $i=0$ and maximise the probability of being correct when $i=1$. (This part is assumed to be familiar to the reader.)

Mathematically, the region $A$ of the observation space must be found such that 1) the probability $P_0(A)$ of the observation lying in $A$ under $p_0$ is some fixed (and small) value, while 2) the probability $P_1(A)$ of the observation lying in $A$ under $p_1$ is as large as possible. (This region $A$ will be the region labelled unity.) This is a constrained optimisation problem involving a region of space. Pleasantly then, there is a simple way of solving it.

The trick is to imaging the observation space as broken up into many small tiles fitting together to cover the observation space. (If the observation space is drawn as a square then imagine dividing it into hundreds of small squares.) With this quantisation, finding the region $A$ (up to quantisation error) means finding which of the tiles we want to be part of the region $A$.

Think of starting with $A$ being the empty set and growing it by adding tiles one at a time. The first tile we add will do two things: it will give us a $P_0(A)$ which we want to keep as small as possible and it will give us a $P_1(A)$ which we want to maximise. (At this point, some readers may wish to re-tile the region with tiles of possibly differing sizes, so that the probability under $P_0$ of landing on any given tile is the same fixed constant regardless of which tile was chosen.) It is intuitively clear (and even clearer after re-tiling) that the very first tile we should pick is the one which will make $\frac{P_1(A)}{P_0(A)}$ as large as possible. If $P_0(A)$ has not exceeded the limit then we are allowed to pick another tile, and we pick the next best tile, that is, out of the tiles $T_k$ remaining, we want the tile that maximises the ratio $\frac{p_1(T_k)}{p_0(T_k)}$. Each time, we greedily pick the best remaining tile that will help us make $P_1(A)$ as large as possible while not letting $P_0(A)$ grow too quickly.

As the size of the tiles shrinks to zero, the “limit” is the likelihood ratio $\frac{p_1(x)}{p_0(x)}$ for the “infinitesimal” tile $T_x$ centred at $x$. Choosing the region $A = \{x \mid \frac{p_1(x)}{p_0(x)} > c\}$ for some constant $c$ corresponds precisely to choosing the tiles $T_x$ with the largest ratios; every tile with ratio better than $c$ is included.

The above is a hand-waving argument and is not the best on which to base a rigorous proof. Often this is the case though: an intuitive derivation is subsequently verified by a less intuitive but more elementary mathematical argument. An elementary proof can be found in the Wikipedia.