### Archive

Archive for October, 2011

## When is Independence not Independence?

October 25, 2011 1 comment

This brief article uses statistical independence as an example of when a mathematical definition is intentionally chosen to be different from the original motivating definition. (Another example comes from topology; the motivating/naive definition of a topological space would involve limits but instead open sets are used to define a topology.) This exemplifies the following messages:

• There is a difference between mathematical manipulations and intuition (and both must be learnt side-by-side); see also the earlier article on The Importance of Intuition.
• Understanding a definition mainly means understanding the usefulness of the definition and how it can be applied in practice.
• This has implications for how to teach and how to learn mathematical concepts.

Two random events, $A$ and $B$, are statistically independent if $\mathcal{P}[A \cap B] = \mathcal{P}[A] \mathcal{P}[B]$. Here is the (small) conundrum. If one were to stare at this definition, it may not make much sense. What is it really telling us about the two events? On the other hand, if one were to learn that if $A$ and $B$ are “unrelated” events that have “nothing to do with each other” then $\mathcal{P}[A \cap B] = \mathcal{P}[A] \mathcal{P}[B]$ must hold, then one might falsely believe to have understood the definition. Indeed, if $A$ and $B$ are related to each other, and $B$ and $C$ are related to each other, then surely $A$ and $C$ are related to each other? Conversely, if event $A$ is defined in terms of event $B$ then surely $A$ is related to $B$? Both these statements are false if ‘related’ is replaced by ‘statistically dependent’.

The true way of understanding statistical independence is to i) acknowledge that while it is motivated from real life by the intuitive notion of unrelated events, it is a different concept that has nevertheless proved to be very useful; and ii) be able to list a number of useful applications. Therefore, upon reading a definition that does not immediately feel comfortable, it may be better to flick through the remainder of the book to see the various uses of the definition than to stare blankly at the definition hoping for divine intuition.

For completeness, two naturally occurring examples of how statistical independence differs from “functional independence” are given. The first comes from the theory of continuous-time Markov chains but can be stated simply. Let $\lambda_1$ and $\lambda_2$ be two positive real numbers representing departure rates. Let $T_1$ and $T_2$ be independent and exponentially distributed random variables with parameters $\lambda_1$ and $\lambda_2$ respectively.  (That is, $\mathcal{P}[ T_i > t ] = e^{-\lambda_i t}$ for $t \geq 0$ and $i = 1,2$.) The rule for deciding where to move to next (in the context of Markov chains) is to see which departure time, $T_1$ or $T_2$, is smaller.  (If $T_1$ is smaller than $T_2$ we move to destination 1, otherwise we move to destination 2.) Let $p$ be the probability that $T_1$ is smaller: $p = \mathcal{P}[T_1 < T_2]$. It can be shown that $p = \lambda_1 / (\lambda_1 + \lambda_2)$, and moreover, that the event $T_1 < T_2$ is statistically independent of the departure time $\min\{T_1,T_2\}$. This may seem strange if one thinks in terms of related events, so it is important to treat statistical independence as a mathematical concept that merely means $\mathcal{P}[A \cap B] = \mathcal{P}[A] \mathcal{P}[B]$ regardless of whether or not $A$ and $B$ are, in any sense of the word, “related” to each other.

The second example is that an event $A$ can be statistically independent of itself! In fact, this turns out to be useful: to prove that $A$ is an “extreme” event, by which I merely mean that either $\mathcal{P}[A] = 0$ or $\mathcal{P}[A]=1$, it suffices to prove that $A$ is independent of itself, and sometimes the latter is easier to prove than the former. (Furthermore, having first proved that $\mathcal{P}[A]$ can only be zero or one can then make it easier to prove that it equals one, for instance.)

In closing, it is remarked that one can always challenge a definition by asking why this particular definition. Perhaps a different definition of statistical independence might be better? The response will always be: try to find a better definition! Sometimes you might be successful; this is how definitions are refined and generalised over time. Just keep in mind that a “good” definition is one that is useful and not necessarily one that mimics perfectly our intuition from the real world.

## Tensors and Matrices

This is a sequel to The Tensor Product in response to a comment posted there. It endeavours to explain the difference between a tensor and a matrix.  It also explains why ‘tensors’ were not mentioned in The Tensor Product.

A matrix is a two-dimensional array of numbers (belonging to a field such as $\mathbb{R}$ or $\mathbb{C}$) which can be used freely for any purpose, including for organising data collected from an experiment. Nevertheless, there are a number of commonly defined operations involving scalars, vectors and matrices, such as matrix addition, matrix multiplication, matrix-by-vector multiplication and scalar multiplication. These operations allow a matrix to be used to represent a linear map from one vector space to another, and it is this aspect of matrices that is the most relevant here.

Although standard usage means that an $n \times m$ matrix over $\mathbb{R}$ defines a linear map from the vector space $\mathbb{R}^m$ to the vector space $\mathbb{R}^n$, this is an artefact of $\mathbb{R}^n$ and $\mathbb{R}^m$ implicitly being endowed with a canonical choice of basis vectors. It is perhaps cleaner to start from scratch and observe that a matrix on its own does not define a linear map between two vector spaces: given two two-dimensional vector spaces $V, W$ and the matrix $A = [1,2;3,4]$ (by which I mean the two-by-two matrix whose elements are, from left-to-right top-to-bottom, $1,2,3,4$), what linear map from $V$ to $W$ does $A$ represent? By convention, a linear map is represented by a matrix in the following way, with respect to a particular choice of basis vectors for $V$ and for $W$. Let $v_1, v_2 \in V$ be a basis for $V$, and $w_1, w_2 \in W$ a basis for $W$. If $f: V \rightarrow W$ is a linear map then it is fully determined once the values of $f(v_1)$ and $f(v_2)$ are revealed. Furthermore, since $f(v_1)$ is an element of $W$, it can be written as $f(v_1) = \alpha_{11} w_1 + \alpha_{21} w_2$ for a unique choice of scalars $\alpha_{11}$ and $\alpha_{21}$. Similarly, $f(v_2) = \alpha_{12} w_1 + \alpha_{22} w_2$. Knowing the scalars $\alpha_{11}, \alpha_{12}, \alpha_{21}$ and $\alpha_{22}$, together with knowing the choice of basis vectors $v_1, v_2, w_1$ and $w_2$, allows one to determine what the linear map $f$ is. (An alternative but essentially equivalent approach would have been to agree that a matrix defines a linear map from $\mathbb{R}^m$ to $\mathbb{R}^n$, and therefore, to represent a linear map $f: V \rightarrow W$ by a matrix, it is first necessary to choose an isomorphism from $V$ to $\mathbb{R}^m$ and an isomorphism from $W$ to $\mathbb{R}^n$.)

Unlike a matrix, which can only represent a linear map between vector spaces once a choice of basis vectors has been made, a tensor is a linear (or multi-linear) map. The generalisation from linear to multi-linear maps is a distraction which may lead one to believe the difference between a tensor and a matrix is that a tensor is a generalisation of a matrix to higher dimensions, but this is missing the key point: the machinery of changes of coordinates, which is external to the definition of a matrix as an array of numbers, is internal to the definition of a tensor.

Examples of tensors are linear maps $f: V \rightarrow \mathbb{R}$ and $g: V \rightarrow V$, and bilinear maps $h: V \times V \rightarrow \mathbb{R}$. They are tensors because they are multi-linear maps between vector spaces. Choices of basis vectors do not enter the picture until one wishes to describe a particular map $f$ to a friend; unless it is possible to define $f$ in terms of known linear maps such as $\mathrm{trace}$, it becomes necessary to write down a set of basis vectors and specify the tensor as an array of numbers with respect to this choice of basis vectors. This leads to the traditional definition of tensors, which is still commonly used in physics and engineering.

For convenience and consistency of notation, usually tensors are re-written as multi-linear maps into $\mathbb{R}$ (or whatever the ground field is). Both $f$ and $h$ above are already of this form, but $g$ is not. This is easily rectified; there is a natural equivalence between linear maps $g: V \rightarrow V$ and bilinear maps $\tilde g: V \times V^\ast \rightarrow \mathbb{R}$ where $V^\ast$ is the dual space of $V$; recall that elements of $V^\ast$ are simply linear functionals on $V$, that is, if $\sigma \in V^\ast$ then $\sigma$ is a linear function $\sigma: V \rightarrow \mathbb{R}$. This equivalence becomes apparent by observing that if, for a fixed $v \in V$, the values of $(\sigma \circ g)(v) = \sigma(g(v))$ are known for every $\sigma \in V^\ast$ then the value of $g(v)$ is readily deduced, and furthermore, the map taking a $v \in V$ and a $\sigma \in V^\ast$ to $(\sigma \circ g)(v) \in \mathbb{R}$ is bilinear; precisely, the correspondence between $g$ and $\tilde g$ is given by $\tilde g(v,\sigma) = (\sigma \circ g)(v)$. (If that’s not immediately clear, an intermediate step is the realisation that if the value of $\sigma(w)$ is known for every $\sigma \in V^\ast$ then the value of $w \in V$ is readily determined. Therefore, if we are unhappy about the range of $g$ being $V$, we can simply use elements of $V^\ast$ to probe the value of $w=g(v)$.)

An equivalent definition of a tensor is therefore a multi-linear map of the form $T: V \times \cdots \times V \times V^\ast \cdots \times V^\ast \rightarrow \mathbb{R}$; see the wikipedia for details. (Linear maps between different vector spaces is a slightly more general concept and is not considered here for simplicity.)

It remains to introduce the tensor product (and to give another definition of tensors, this time in terms of tensor products).

First observe that $\mathcal{T}^p_q$, the set of all multi-linear maps $T: V^\ast \times \cdots \times V^\ast \times V \times \cdots \times V \rightarrow \mathbb{R}$ where there are $p$ copies of $V^\ast$ and $q$ copies of $V$, can be made into a vector space in an obvious way; just use pointwise addition and scalar multiplication of the multi-linear maps. Next, observe that $\mathcal{T}^0_1$ and $V^\ast$ are isomorphic. Furthermore, since $V^{\ast\ast}$, the dual of the dual of $V$, is naturally isomorphic to $V$, it is readily seen that $\mathcal{T}^1_0$ is isomorphic to $V$. Can $\mathcal{T}^p_q$ be constructed from multiple copies of $\mathcal{T}^0_1$ and $\mathcal{T}^1_0$?

It turns out that $\mathcal{T}^p_q$ is isomorphic to $\mathcal{T}^1_0 \otimes \cdots \otimes \mathcal{T}^1_0 \otimes \mathcal{T}^0_1 \otimes \cdots \otimes \mathcal{T}^0_1$ where there are $p$ copies of $\mathcal{T}^1_0$, $q$ copies of $\mathcal{T}^0_1$ and $\otimes$ is the tensor product defined in The Tensor Product. Alternatively, one could have invented the tensor product by examining how $\mathcal{T}^2_0$ can be constructed from two copies of $\mathcal{T}^1_0$, then observing that the same construction can be repeated and applied to $\mathcal{T}^1_0$, thereby ‘deriving’ a useful operation denoted by $\otimes$.

Another equivalent definition of a tensor is therefore an element of a vector space of the form $V \otimes \cdots \otimes V \otimes V^\ast \otimes \cdots \otimes V^\ast$, and this too is explained in the wikipedia.

To summarise the discussion so far (and restricting attention to the scalar field $\mathbb{R}$ for simplicity):

• Matrices do not, on their own, define linear maps between vector spaces (although they do define linear maps between Euclidean spaces $\mathbb{R}^n$).
• A tensor is a multi-linear map whose domain and range involve zero or more copies of a vector space $V$ and its dual $V^\ast$.
• Any such map can be re-arranged to be of the form $T: V^\ast \times \cdots \times V^\ast \times V \times \cdots \times V \rightarrow \mathbb{R}$.
• The space of such maps (for a fixed number $q$ of copies of $V$ and $p$ copies of $V^\ast$) forms a vector space denoted $\mathcal{T}^p_q$.
• It turns out that there exists a single operation $\otimes$ which takes two vector spaces and returns a third, such that, for any $p,q,r,s$, $\mathcal{T}^p_q \otimes \mathcal{T}^r_s$ is isomorphic to $\mathcal{T}^{p+r}_{q+s}$.  This operation is called the tensor product.
• Since $\mathcal{T}^1_0$ and $\mathcal{T}^0_1$ are isomorphic to $V$ and $V^\ast$ respectively, an equivalent definition of a tensor is an element of the vector space $V \otimes \cdots \otimes V \otimes V^\ast \otimes \cdots \otimes V^\ast$.

Some loose ends are now tidied up. Without additional reading though, certain remaining parts of this article are unlikely to be self-explanatory; the main purpose is to alert the reader what to look out for when learning from textbooks. First, it will be explained how an element of $V \otimes V^\ast$ represents a linear map $h: V \rightarrow V$. Then an additional usage will be given of the tensor product symbol: the tensor product of two multi-linear maps results in a new multi-linear map. This additional aspect of tensor products was essentially ignored in The Tensor Product. Lastly, an explanation is given of why I omitted any mention of tensors in The Tensor Product.

Let $x \in V \otimes V^\ast$. The naive way to proceed is as follows. Introduce a basis $\{v_i\}$ for $V$ and $\{\sigma_j\}$ for $V^\ast$; different choices will ultimately lead to the same result. Then $x$ can be written as a linear combination $x = \sum_{i,j} \alpha_{ij} v_i \otimes \sigma_j$ where the $\alpha_{ij}$ are scalars. Recall from The Tensor Product that the $v_i \otimes \sigma_j$ are just formal symbols used to distinguish one basis vector of $V \otimes V^\ast$ from another. Here’s the trick; we now associate to each $v_i \otimes \sigma_j$ the linear map $h_{ij}: V \rightarrow V$ that sends $v \in V$ to $\sigma_j(v) v_i$; clearly $v \mapsto \sigma_j(v) v_i$ is a linear map from $V$ to $V$. (This is relatively easy to remember, for how else could we combine $v_i$ with $\sigma_j$ to obtain a linear map from $V$ to $V$?) Then, we associate to $x = \sum_{i,j} \alpha_{ij} v_i \otimes \sigma_j$ the linear map $h = \sum_{i,j} \alpha_{ij} h_{ij}$. It can be verified that this mapping is an isomorphism from the vector space $V \otimes V^\ast$ to the vector space of linear maps from $V$ to $V$, and moreover, the same mapping results regardless of the original choice of basis vectors for $V$ and $V^\ast$. While this is useful for actual computations, it does not explain how we knew to use the above trick of sending $v \in V$ to $\sigma_j(v) v_i$.

A more sophisticated way to proceed uses the universal property characterisation of tensor product and makes it clear why the above construction works. (The universal property characterisation is defined in the wikipedia among other places.) Essentially, under this characterisation, every bilinear map from $V \times V^\ast$ to $\mathbb{R}$ induces a unique linear map from $V \otimes V^\ast$ to $\mathbb{R}$, and conversely, every linear map from $V \otimes V^\ast$ to $\mathbb{R}$ induces a unique bilinear map from $V \times V^\ast$ to $\mathbb{R}$. Now, we already know from earlier that linear maps from $V$ to $V$ are equivalent to bilinear maps from $V \times V^\ast$ to $\mathbb{R}$. As now shown, choosing an element of $V \otimes V^\ast$ is equivalent to choosing a linear map from $V \otimes V^\ast$ to $\mathbb{R}$. Indeed, by definition of dual, linear maps from $V \times V^\ast$ to $\mathbb{R}$ are precisely the elements of $(V \otimes V^\ast)^\ast \cong V^\ast \otimes V^{\ast\ast} \cong V^\ast \otimes V \cong V \otimes V^\ast$, as required.

So far, we have only introduced the tensor product of two vector spaces. However, there is a companion operation which takes elements $v \in V$ and $w \in W$ of vector spaces and returns an element $v \otimes w$ of the vector space $V \otimes W$. (Recall that we have only introduced the formal symbol $v_i \otimes w_j$ to denote a basis vector in the case where $\{v_i\}$ and $\{w_j\}$ are chosen bases for $V$ and $W$; no meaning has been given yet to $v \otimes w$.)  For calculations, it suffices to think of $v \otimes w$ as the element obtained by applying formal algebraic laws such as $(u+v) \otimes w = (u \otimes w) + (v \otimes w)$. Precisely, if $v = \sum_i \alpha_i v_i$ and $w = \sum_j \beta_j w_j$ where $\{v_i\}$ and $\{w_j\}$ are chosen bases for $V$ and $W$ then $v \otimes w$ is defined to be $\sum_{i,j} \alpha_i \beta_j (v_i \otimes w_j)$, as suggested by the formal manipulations $v \otimes w = (\sum_i \alpha_i v_i) \otimes (\sum_j \beta_j v_j) = \sum_i \alpha_i (v_i \otimes \sum_j \beta_j v_j) = \sum_i \alpha_i \sum_j \beta_j (v_i \otimes v_j)$. I mention this only because I want to point out that this leads to the following simple rule for computing the tensor product of two multi-linear maps.

Let $S$ and $T$ be multi-linear maps; in fact, for ease of presentation, only the special case $S,T: V \rightarrow \mathbb{R}$ will be considered. Then a multi-linear map can be formed from $S$ and $T$, namely, $(u,v) \mapsto S(u) T(v)$. This multi-linear map is denoted $S \otimes T$ and is called the tensor product of $S$ and $T$, in agreement with the discussion in the previous paragraph.

It is easier to motivate the tensor product $S \otimes T$ of two tensors than it is to motivate the tensor product of two tensor spaces $\mathcal{T}^p_q \otimes \mathcal{T}^r_s$. Here is an example of such motivation.

What useful multi-linear maps can be formed from the linear maps $S,T: V \rightarrow \mathbb{R}$? Playing around, it seems $v \mapsto S(v)+T(v)$ is a linear map; let’s call it $S+T$. Multiplication does not work because $v \mapsto S(v)T(v)$ is not linear, so $ST$ is not a tensor. The ordinary cross-product would lead to a map $S \times T: V \times V \rightarrow \mathbb{R} \times \mathbb{R}$ which is not of the form of a tensor as stated earlier. What we can do though is form the map $(v,w) \mapsto S(v)T(w)$.  We denote this bilinear map by $S \otimes T: V \times V \rightarrow \mathbb{R}$. Experience has shown the construction $S \otimes T$ to be useful (which, at the end of the day, is the main justification for introducing new definitions and symbols), although for the moment, the choice of symbol $\otimes$ has not been justified save for it should be different from more common symbols such as $S+T$, $ST$ and $S \times T$ which mean different things, as observed immediately above. Furthermore, the definition of $\otimes$ as stated above readily extends to general tensors…

One could perhaps use the above paragraph as the start of an introduction to tensor products. In The Tensor Product, I chose instead to ignore tensors completely because, although there is nothing ‘difficult’ about them, the above indicates that there are a multitude of small issues that need to be explained, and at the end of the day, it is not even clear at the outset if there is any use in studying tensor products of tensor spaces! What does the fancy symbol $\otimes$ buy us that we could not have obtained for free just by working directly with multi-linear maps and arrays of numbers? (There are benefits, but they appear in more advanced areas of mathematics and hence are hard to motivate concretely at an elementary level. Of course, one could say the tensor product reduces the study of multi-linear maps to linear maps, that is, back to more familiar territory, but multi-linear maps are not that difficult to work with directly in the first place. )

The Tensor Product motivated the tensor product by wishing to construct $\mathbb{R}[x,y]$ from $\mathbb{R}[x]$ and $\mathbb{R}[y]$. It was stated there that this allowed properties of $\mathbb{R}[x,y]$ to be deduced from properties of $\mathbb{R}[x]$. A solid example of this would be localisation of rings: if we have managed to show that the localisation $(\mathbb{R}[x])[1/x]$ is isomorphic to the ring of Laurent polynomials in one variable, then properties of the tensor product allow us to conclude that the localisation $(\mathbb{R}[x,y])[1/xy]$ is isomorphic to the ring of Laurent polynomials in two variables. Even without such examples though, I am more comfortable taking for granted that it is useful to be able to construct $\mathbb{R}[x,y]$ from $\mathbb{R}[x]$ and $\mathbb{R}[y]$ than it is useful to be able to construct $\mathcal{T}^{p+q}_{r+s}$ from $\mathcal{T}^p_r$ and $\mathcal{T}^q_s$.

## The Tensor Product

Various discussions on the internet indicate the concept of tensor product is not always intuitive to grasp on a first reading. Perhaps the reason is it is harder to motivate the concept of tensor product on the vector space $\mathbb{R}^n$ than it is to motivate the concept of a tensor product on a polynomial ring. The following endeavours to give an easily understood pre-introduction to the tensor product. That is to say, the aim is to motivate the standard introductions that are online and in textbooks. Familiarity is assumed with only linear algebra and basic polynomial manipulations (i.e., adding and multiplying polynomials), hence the first step is to introduce just enough formalism to talk about polynomial rings.

By $\mathbb{R}[x]$ is meant the set of all polynomials in the indeterminate $x$ with real-valued coefficients. With the usual definitions of addition and scalar multiplication, $\mathbb{R}[x]$ becomes a vector space over the field $\mathbb{R}$ of real numbers.  For example, $x^2 + 2x + 3$ and $x^3 - 2x$ are elements of $\mathbb{R}[x]$ and can be added to form $x^3 + x^2 + 3$. An example of scalar multiplication is that $3 \in \mathbb{R}$ times $x^2 + 2x + 3 \in \mathbb{R}[x]$ is $3x^2 + 6x + 9 \in \mathbb{R}[x]$. These definitions of addition and scalar multiplication together satisfy the axioms of a vector space, thereby making $\mathbb{R}[x]$ into a vector space over $\mathbb{R}$. (The reason why $\mathbb{R}[x]$ is called a polynomial ring rather than a polynomial vector space is because it has additional structure — it is a special kind of a vector space — in the form of a multiplication operator which is compatible with the addition and scalar multiplication operators. This is not important for us though.)

Polynomials in two indeterminates, say $x$ and $y$, also form a vector space (and indeed, a ring) with respect to the usual operations of polynomial addition and scalar multiplication.  This vector space is denoted $\mathbb{R}[x,y]$.

The cross-product $\times$ of two vector spaces is relatively easy to motivate and understand, so much so that the reader is assumed to be familiar with the cross-product.  Recall that if $U$ and $V$ are vector spaces then the elements of $U \times V$ are merely the pairs $(u,v)$ where $u \in U$ and $v \in V$. Recall too that $\mathbb{R} \times \mathbb{R}$ is isomorphic to $\mathbb{R}^2$. In particular, observe that the cross-product is a means of building a new vector space from other vector spaces.

The tensor-product is just like the cross-product in that it too allows one to build a new vector space from other vector spaces. It might be illuminating to think of the other vector spaces as building blocks, and the new vector space as something more complicated built from these simpler building blocks, although of course this need not always be the case. Regardless, what makes such constructions useful is that properties of the new vector space can be deduced from properties of the building block vector spaces. (The simplest example of such a property is the dimension of a vector space; the dimension of $U \times V$ is the sum of the dimensions of $U$ and $V$.)

Can $\mathbb{R}[x,y]$ be obtained from $\mathbb{R}[x]$ and $\mathbb{R}[y]$? Let’s try the cross-product. The vector space $\mathbb{R}[x] \times \mathbb{R}[y]$ consists of pairs $(p(x),q(y))$ where $p(x)$ represents a polynomial in $\mathbb{R}[x]$ and $q(y)$ a polynomial in $\mathbb{R}[y]$. This does not appear to work, for how should the polynomial $xy^2 + yx^3 + 1$ be represented in the form $(p(x),q(y))$? (If the ring structure were taken into account then it would be possible to multiply two elements of $\mathbb{R}[x] \times \mathbb{R}[y]$ and it would be seen that, in effect, $\mathbb{R}[x] \times \mathbb{R}[y]$ consists of polynomials that can be factored as $p(x)q(y)$, a proper subset of $\mathbb{R}[x,y]$. For this introduction though, all that is important is that $\mathbb{R}[x,y]$ is not the cross-product of $\mathbb{R}[x]$ and $\mathbb{R}[y]$.)

With the belief that $\mathbb{R}[x,y]$ should be constructible from $\mathbb{R}[x]$ and $\mathbb{R}[y]$, let’s try to figure out what is required to get the job done. Favouring simplicity over sophistication, let’s investigate the problem in terms of basis vectors.  The obvious choices of basis vectors are $\{1,x,x^2,\cdots\}$ for $\mathbb{R}[x];$ $\{1,y,y^2,\cdots\}$ for $\mathbb{R}[y];$ and $\{1,x,y,x^2,xy,y^2,\cdots\}$ for $\mathbb{R}[x,y].$ We are in luck as the pattern is easy to spot: the basis vectors of $\mathbb{R}[x,y]$ are just all pairwise products (in the sense of polynomial multiplication) of the basis vectors of $\mathbb{R}[x]$ with the basis vectors of $\mathbb{R}[y]$!

We are therefore motivated to define a construction that takes two vector spaces $U$ and $V$ and creates a new vector space, denoted $U \otimes V$, where the new vector space is defined in terms of its basis vectors; roughly speaking, the basis vectors of $U \otimes V$ comprise all formal pairwise products of basis vectors in a particular basis of $U$ with basis vectors in a particular basis of $V$. How to do this precisely may not be readily apparent but the fact that $\mathbb{R}[x,y]$ is a well-defined vector space that, intuitively at least, is constructible from $\mathbb{R}[x]$ and $\mathbb{R}[y]$ gives us hope that such a construction can be made to work. Needless to say, such a construction is possible and is precisely the tensor product.

There are two equivalent ways of defining the tensor product. The first follows immediately from the above description. If $\{u_1,\cdots,u_i\}$ and $\{v_1,\cdots,v_j\}$ are bases for $U$ and $V$ then $U \otimes V$ is defined to be the vector space formed from all formal linear combinations of the basis vectors $u_1 \otimes v_1, u_1 \otimes v_2, \cdots, u_1 \otimes v_j, u_2 \otimes v_1, \cdots, v_i \otimes v_j$. It is emphasised that $u_1 \otimes v_1$ is just a symbol, a means for identifying a particular basis vector. (At the risk of belabouring the point, I can form a vector space from the basis vectors Fred and Charlie; typical elements would be 2 Fred + 3 Charlie and 5 Fred – 1 Charlie, and the vector space addition of these two elements would be 7 Fred + 2 Charlie.) Of course, choosing a different set of basis vectors for $U$ or $V$ would result in a different vector space $U \otimes V$, however, the resulting vector spaces will always be isomorphic to each other. Therefore, $U \otimes V$ should be thought of as representing any vector space that can be obtained using the above method. (This finer point is not dwelled on here but is a common occurrence in mathematics; a construction can yield a new space which is only unique up to isomorphism. Once tensor products have been understood, the reader is invited to think further about this lack of uniqueness, how it is handled and why it is not an issue in practice.)

The more sophisticated way of defining the tensor product is to consider bilinear maps. Like most things, it is important to understand both methods. The elementary method above provides a basic level of intuition that is easy to grasp but the method is clumsy to work with and harder to generalise. The more sophisticated method is more convenient to work with and adds an extra layer of intuition, but masks the basic level of intuition and therefore actually offers less in the way of intuition to the neophyte.

The motivated reader may now like to:

1. Read an introduction to the tensor product on vector spaces that uses basis vectors.
2. Read an alternative introduction that uses bilinear maps.
3. Work out why the two methods are equivalent.

A hint is offered as to why there is a relationship between “products” and “bilinear maps”, and why the tensor product can be thought of as “the most general product”. Assume we want to define a product on a vector space.  That is, let $V$ be a vector space and we want to define a function $f: V \times V \rightarrow V$ that we think of as multiplication; given $u,v \in V$, their product is defined to be $f(u,v)$. What do we mean by multiplication? Presumably, we mean that the laws hold of associativity, distributivity and scalar multiplication, such as $f(u,f(v,w)) = f(f(u,v),w)$ and $f(u+v,w) = f(u,w) + f(v,w)$. Merely requiring the scalar multiplication and distributive laws to hold is equivalent to requiring that $f$ be bilinear; that is the connection. In general, one can seek to define a multiplication rule between two different vector spaces, which is the level of generality at which the cross-product works. As textbooks will hasten to point out, any bilinear function $f: U \times V \rightarrow W$ can be represented by a linear map from $U \otimes V$ to $W$. Personally, I prefer to think of this in the first instance as a bonus result we get for free from the tensor product rather than as the initial motivation for introducing the tensor product; only with the benefit of hindsight are we motivated to define the tensor product using bilinear maps.

Tensor products turn out to be very convenient in other areas too. As just one example, in homological algebra they are a convenient way of changing rings. A special case is converting a vector space over the real numbers into a vector space over the complex numbers; an engineer would not think twice about doing this — anywhere a real number is allowed, allow now a complex number, and carry on as normal! — but how can it be formalised? Quite simply in fact; if $V$ is a vector space over the real field then $\mathbb{C} \otimes V$ is the complexification of $V$; although technically $\mathbb{C} \otimes V$  is a vector space over the real field according to our earlier definition of tensor product, there is a natural way to treat it as a vector space over the complex field. The reader is invited to contemplate this at leisure, recalling that $\mathbb{C}$ is a two-dimensional vector space over the real field, with basis $\{1,\sqrt{-1}\}$.

In conclusion, the tensor product can be motivated by the desire to construct the vector space of polynomials in two indeterminates out of two copies of a vector space of polynomials in only a single indeterminate. Once the construction is achieved, it is found to have a number of interesting properties, including properties relating to bilinear maps. With the benefit of hindsight, it is cleaner to redefine the tensor product in terms of bilinear maps; this broadens its applicability and tidies up the maths, albeit at the expense of possibly making less visible some of the basic intuition about the tensor product.

## On Learning and Teaching Mathematics: Nothing is Elementary, and the Importance of Intuition

Learning is ineffective if attempted in a linear fashion. Fine details are best learnt, appreciated and remembered by those that can spontaneously describe and answer questions about the coarser details of the subject at hand. Therefore, it can be valuable for more advanced books to revisit “elementary” concepts because rarely is anything sufficiently elementary that nothing more remains to be known.  The following quote is apt; the emphasis is my own.  (All quotes below are taken from the preface of C. Lanczos (1966) Discourse on Fourier Series.)

By the nature of things it was necessary to develop the subject from its early beginnings and this explains the fact that even so-called “elementary” concepts, such as the idea of a function, the meaning of limit, uniform convergence and similar “well-known” subjects of analysis were included in the discussion. Far from being bored, the students found this procedure highly appropriate, because very often exactly the apparently “elementary” ideas of mathematics — which are in fact “elementary” only because they are relegated to the undergraduate level of instruction, although their true significance cannot be properly grasped on that level — cause great difficulties in proceeding to the more advanced subjects.

There are three interwoven aspects of mathematical knowledge:

• Intuition — the “pictures” one forms (consciously or subconsciously) in one’s head when reasoning about a problem or endeavouring to generalise a concept.
• Rigour — the formalisation and verification of definitions and proofs.
• Communication — the transfer of mathematical knowledge from one person to another.

Pictures, formulae and discussions are generally how mathematical knowledge is communicated from one person to another. It is important though to isolate this from the actual understanding of mathematics itself. The formula $f(x)=\sin x$ is not in itself what is “understood” by someone reading it. Rather, seeing the formula $f(x)=\sin x$ conjures up a wealth of images in one’s subconscious mind which can be then refined further and reasoned with; seeing the formula primes relevant areas of the cortex facilitating subsequent thought.  One “understands” $f(x)=\sin x$ because one can reason with it and answer questions about it, for instance, one can graph it, differentiate it, find its zeros, write down its Taylor series expansion,  draw a relevant right-angled triangle and so forth. [Understanding is therefore relative to the questions one has asked oneself or otherwise encountered to date.] Memorising a result does not immediately lead to understanding. Understanding occurs only after one’s mind has formed associations that link the result with other stored knowledge. The degree of understanding is related to the scope and complexity of such associations.

To distinguish rigour from intuition, consider reading the proof of a theorem. It is possible to check a proof is correct without having any sense of actually “understanding” the proof, or even of “understanding” why the theorem should be true. In fact, it is possible to come up with a proof without “understanding” it!  That is to say, by trial-and-error and (subconscious) pattern recognition (e.g., making substitutions and transformations one has seen before without quite being sure one is heading in the right direction), one can write down an algebraic proof of a theorem in convex analysis, say, without being able to offer any geometric picture or other explanation to justify how the proof was found. It is often worth the extra effort to develop a sense of intuition about theorems and their proofs. Intuition and rigour together provide a sense of understanding and increase one’s fluency in mathematics.

In some cases, intuition and rigour go hand in hand; one can translate directly one’s intuition into a proof. That this is not always the case is perhaps the only reason why teaching and learning mathematics is non-trivial: It is easy to convey rigour (at least, no harder than programming a computer), and all too easy for an author to convey only rigour and leave it to the readers’ mathematical maturity and ingenuity to deduce the intuition for themselves. Conveying intuition is not necessarily more difficult, but for an author, there are apparently drawbacks.

The most obvious drawback is verbosity. Stating and proving a theorem in its full level of generality (à la Bourbaki) takes considerably less space than does proving a basic theorem using one technique, then motivating the generalisation of the theorem, then pointing out why the proof of the basic theorem does not generalise, then motivating a new proof technique and finally proving the general theorem. Another drawback is imprecision and inaccuracy; intuition need not be precise nor even accurate for it to be valuable, yet some authors may be uncomfortable committing to paper anything even remotely inaccurate.

Tracing the historical development of a subject can provide a wealth of intuition. Here is what Lanczos has to say on the matter.

… a close tie with the historical development seemed appropriate, although the author is well aware that this exposes him to the charge of datedness. We have to be “modern” and there are those who believe that before the advent of our own blessed era the pursuers of mathematics lived in a kind of no-man’s-land, bumping against each other in the gloomy haze that pervaded everything (“Euclid must go!”). But there are others (and the author belongs to the latter group), who believe that the great masters of the eighteenth and nineteenth centuries, Lagrange, Euler, Gauss, Cauchy, Riemann, Fourier, Dirichlet, and many others, were not necessarily lacking mathematical intelligence and some of them might even be comparable to the geniuses of today.

One wonders if occasionally intuition is purposely withheld, the false reasoning being that the merit of an idea is judged by how complicated it is to understand. Merit should be judged by originality, usefulness and simplicity rather than complexity. A thing is understood when it appears to be simple.

To display formal fireworks, which are so much in the centre of many mathematical treatises — perhaps as a status-symbol by which one gains admission to the august guild of mathematicians — was not the primary aim of the book.

In conclusion, when teaching or learning mathematics, keep in mind that intuition and rigour are distinct aspects whose synergy forms mathematical knowledge. Intuition and rigour should be learnt together because rigour without intuition is like owning a car without the key; you can admire the car for its beauty but you cannot get very far with it.

Categories: Education, Quotes