This brief article uses statistical independence as an example of when a mathematical definition is intentionally chosen to be different from the original motivating definition. (Another example comes from topology; the motivating/naive definition of a topological space would involve limits but instead open sets are used to define a topology.) This exemplifies the following messages:
- There is a difference between mathematical manipulations and intuition (and both must be learnt side-by-side); see also the earlier article on The Importance of Intuition.
- Understanding a definition mainly means understanding the usefulness of the definition and how it can be applied in practice.
- This has implications for how to teach and how to learn mathematical concepts.
Two random events, and , are statistically independent if . Here is the (small) conundrum. If one were to stare at this definition, it may not make much sense. What is it really telling us about the two events? On the other hand, if one were to learn that if and are “unrelated” events that have “nothing to do with each other” then must hold, then one might falsely believe to have understood the definition. Indeed, if and are related to each other, and and are related to each other, then surely and are related to each other? Conversely, if event is defined in terms of event then surely is related to ? Both these statements are false if ‘related’ is replaced by ‘statistically dependent’.
The true way of understanding statistical independence is to i) acknowledge that while it is motivated from real life by the intuitive notion of unrelated events, it is a different concept that has nevertheless proved to be very useful; and ii) be able to list a number of useful applications. Therefore, upon reading a definition that does not immediately feel comfortable, it may be better to flick through the remainder of the book to see the various uses of the definition than to stare blankly at the definition hoping for divine intuition.
For completeness, two naturally occurring examples of how statistical independence differs from “functional independence” are given. The first comes from the theory of continuous-time Markov chains but can be stated simply. Let and be two positive real numbers representing departure rates. Let and be independent and exponentially distributed random variables with parameters and respectively. (That is, for and .) The rule for deciding where to move to next (in the context of Markov chains) is to see which departure time, or , is smaller. (If is smaller than we move to destination 1, otherwise we move to destination 2.) Let be the probability that is smaller: . It can be shown that , and moreover, that the event is statistically independent of the departure time . This may seem strange if one thinks in terms of related events, so it is important to treat statistical independence as a mathematical concept that merely means regardless of whether or not and are, in any sense of the word, “related” to each other.
The second example is that an event can be statistically independent of itself! In fact, this turns out to be useful: to prove that is an “extreme” event, by which I merely mean that either or , it suffices to prove that is independent of itself, and sometimes the latter is easier to prove than the former. (Furthermore, having first proved that can only be zero or one can then make it easier to prove that it equals one, for instance.)
In closing, it is remarked that one can always challenge a definition by asking why this particular definition. Perhaps a different definition of statistical independence might be better? The response will always be: try to find a better definition! Sometimes you might be successful; this is how definitions are refined and generalised over time. Just keep in mind that a “good” definition is one that is useful and not necessarily one that mimics perfectly our intuition from the real world.
This is a sequel to The Tensor Product in response to a comment posted there. It endeavours to explain the difference between a tensor and a matrix. It also explains why ‘tensors’ were not mentioned in The Tensor Product.
A matrix is a two-dimensional array of numbers (belonging to a field such as or ) which can be used freely for any purpose, including for organising data collected from an experiment. Nevertheless, there are a number of commonly defined operations involving scalars, vectors and matrices, such as matrix addition, matrix multiplication, matrix-by-vector multiplication and scalar multiplication. These operations allow a matrix to be used to represent a linear map from one vector space to another, and it is this aspect of matrices that is the most relevant here.
Although standard usage means that an matrix over defines a linear map from the vector space to the vector space , this is an artefact of and implicitly being endowed with a canonical choice of basis vectors. It is perhaps cleaner to start from scratch and observe that a matrix on its own does not define a linear map between two vector spaces: given two two-dimensional vector spaces and the matrix (by which I mean the two-by-two matrix whose elements are, from left-to-right top-to-bottom, ), what linear map from to does represent? By convention, a linear map is represented by a matrix in the following way, with respect to a particular choice of basis vectors for and for . Let be a basis for , and a basis for . If is a linear map then it is fully determined once the values of and are revealed. Furthermore, since is an element of , it can be written as for a unique choice of scalars and . Similarly, . Knowing the scalars and , together with knowing the choice of basis vectors and , allows one to determine what the linear map is. (An alternative but essentially equivalent approach would have been to agree that a matrix defines a linear map from to , and therefore, to represent a linear map by a matrix, it is first necessary to choose an isomorphism from to and an isomorphism from to .)
Unlike a matrix, which can only represent a linear map between vector spaces once a choice of basis vectors has been made, a tensor is a linear (or multi-linear) map. The generalisation from linear to multi-linear maps is a distraction which may lead one to believe the difference between a tensor and a matrix is that a tensor is a generalisation of a matrix to higher dimensions, but this is missing the key point: the machinery of changes of coordinates, which is external to the definition of a matrix as an array of numbers, is internal to the definition of a tensor.
Examples of tensors are linear maps and , and bilinear maps . They are tensors because they are multi-linear maps between vector spaces. Choices of basis vectors do not enter the picture until one wishes to describe a particular map to a friend; unless it is possible to define in terms of known linear maps such as , it becomes necessary to write down a set of basis vectors and specify the tensor as an array of numbers with respect to this choice of basis vectors. This leads to the traditional definition of tensors, which is still commonly used in physics and engineering.
For convenience and consistency of notation, usually tensors are re-written as multi-linear maps into (or whatever the ground field is). Both and above are already of this form, but is not. This is easily rectified; there is a natural equivalence between linear maps and bilinear maps where is the dual space of ; recall that elements of are simply linear functionals on , that is, if then is a linear function . This equivalence becomes apparent by observing that if, for a fixed , the values of are known for every then the value of is readily deduced, and furthermore, the map taking a and a to is bilinear; precisely, the correspondence between and is given by . (If that’s not immediately clear, an intermediate step is the realisation that if the value of is known for every then the value of is readily determined. Therefore, if we are unhappy about the range of being , we can simply use elements of to probe the value of .)
An equivalent definition of a tensor is therefore a multi-linear map of the form ; see the wikipedia for details. (Linear maps between different vector spaces is a slightly more general concept and is not considered here for simplicity.)
It remains to introduce the tensor product (and to give another definition of tensors, this time in terms of tensor products).
First observe that , the set of all multi-linear maps where there are copies of and copies of , can be made into a vector space in an obvious way; just use pointwise addition and scalar multiplication of the multi-linear maps. Next, observe that and are isomorphic. Furthermore, since , the dual of the dual of , is naturally isomorphic to , it is readily seen that is isomorphic to . Can be constructed from multiple copies of and ?
It turns out that is isomorphic to where there are copies of , copies of and is the tensor product defined in The Tensor Product. Alternatively, one could have invented the tensor product by examining how can be constructed from two copies of , then observing that the same construction can be repeated and applied to , thereby ‘deriving’ a useful operation denoted by .
Another equivalent definition of a tensor is therefore an element of a vector space of the form , and this too is explained in the wikipedia.
To summarise the discussion so far (and restricting attention to the scalar field for simplicity):
- Matrices do not, on their own, define linear maps between vector spaces (although they do define linear maps between Euclidean spaces ).
- A tensor is a multi-linear map whose domain and range involve zero or more copies of a vector space and its dual .
- Any such map can be re-arranged to be of the form .
- The space of such maps (for a fixed number of copies of and copies of ) forms a vector space denoted .
- It turns out that there exists a single operation which takes two vector spaces and returns a third, such that, for any , is isomorphic to . This operation is called the tensor product.
- Since and are isomorphic to and respectively, an equivalent definition of a tensor is an element of the vector space .
Some loose ends are now tidied up. Without additional reading though, certain remaining parts of this article are unlikely to be self-explanatory; the main purpose is to alert the reader what to look out for when learning from textbooks. First, it will be explained how an element of represents a linear map . Then an additional usage will be given of the tensor product symbol: the tensor product of two multi-linear maps results in a new multi-linear map. This additional aspect of tensor products was essentially ignored in The Tensor Product. Lastly, an explanation is given of why I omitted any mention of tensors in The Tensor Product.
Let . The naive way to proceed is as follows. Introduce a basis for and for ; different choices will ultimately lead to the same result. Then can be written as a linear combination where the are scalars. Recall from The Tensor Product that the are just formal symbols used to distinguish one basis vector of from another. Here’s the trick; we now associate to each the linear map that sends to ; clearly is a linear map from to . (This is relatively easy to remember, for how else could we combine with to obtain a linear map from to ?) Then, we associate to the linear map . It can be verified that this mapping is an isomorphism from the vector space to the vector space of linear maps from to , and moreover, the same mapping results regardless of the original choice of basis vectors for and . While this is useful for actual computations, it does not explain how we knew to use the above trick of sending to .
A more sophisticated way to proceed uses the universal property characterisation of tensor product and makes it clear why the above construction works. (The universal property characterisation is defined in the wikipedia among other places.) Essentially, under this characterisation, every bilinear map from to induces a unique linear map from to , and conversely, every linear map from to induces a unique bilinear map from to . Now, we already know from earlier that linear maps from to are equivalent to bilinear maps from to . As now shown, choosing an element of is equivalent to choosing a linear map from to . Indeed, by definition of dual, linear maps from to are precisely the elements of , as required.
So far, we have only introduced the tensor product of two vector spaces. However, there is a companion operation which takes elements and of vector spaces and returns an element of the vector space . (Recall that we have only introduced the formal symbol to denote a basis vector in the case where and are chosen bases for and ; no meaning has been given yet to .) For calculations, it suffices to think of as the element obtained by applying formal algebraic laws such as . Precisely, if and where and are chosen bases for and then is defined to be , as suggested by the formal manipulations . I mention this only because I want to point out that this leads to the following simple rule for computing the tensor product of two multi-linear maps.
Let and be multi-linear maps; in fact, for ease of presentation, only the special case will be considered. Then a multi-linear map can be formed from and , namely, . This multi-linear map is denoted and is called the tensor product of and , in agreement with the discussion in the previous paragraph.
It is easier to motivate the tensor product of two tensors than it is to motivate the tensor product of two tensor spaces . Here is an example of such motivation.
What useful multi-linear maps can be formed from the linear maps ? Playing around, it seems is a linear map; let’s call it . Multiplication does not work because is not linear, so is not a tensor. The ordinary cross-product would lead to a map which is not of the form of a tensor as stated earlier. What we can do though is form the map . We denote this bilinear map by . Experience has shown the construction to be useful (which, at the end of the day, is the main justification for introducing new definitions and symbols), although for the moment, the choice of symbol has not been justified save for it should be different from more common symbols such as , and which mean different things, as observed immediately above. Furthermore, the definition of as stated above readily extends to general tensors…
One could perhaps use the above paragraph as the start of an introduction to tensor products. In The Tensor Product, I chose instead to ignore tensors completely because, although there is nothing ‘difficult’ about them, the above indicates that there are a multitude of small issues that need to be explained, and at the end of the day, it is not even clear at the outset if there is any use in studying tensor products of tensor spaces! What does the fancy symbol buy us that we could not have obtained for free just by working directly with multi-linear maps and arrays of numbers? (There are benefits, but they appear in more advanced areas of mathematics and hence are hard to motivate concretely at an elementary level. Of course, one could say the tensor product reduces the study of multi-linear maps to linear maps, that is, back to more familiar territory, but multi-linear maps are not that difficult to work with directly in the first place. )
The Tensor Product motivated the tensor product by wishing to construct from and . It was stated there that this allowed properties of to be deduced from properties of . A solid example of this would be localisation of rings: if we have managed to show that the localisation is isomorphic to the ring of Laurent polynomials in one variable, then properties of the tensor product allow us to conclude that the localisation is isomorphic to the ring of Laurent polynomials in two variables. Even without such examples though, I am more comfortable taking for granted that it is useful to be able to construct from and than it is useful to be able to construct from and .
Various discussions on the internet indicate the concept of tensor product is not always intuitive to grasp on a first reading. Perhaps the reason is it is harder to motivate the concept of tensor product on the vector space than it is to motivate the concept of a tensor product on a polynomial ring. The following endeavours to give an easily understood pre-introduction to the tensor product. That is to say, the aim is to motivate the standard introductions that are online and in textbooks. Familiarity is assumed with only linear algebra and basic polynomial manipulations (i.e., adding and multiplying polynomials), hence the first step is to introduce just enough formalism to talk about polynomial rings.
By is meant the set of all polynomials in the indeterminate with real-valued coefficients. With the usual definitions of addition and scalar multiplication, becomes a vector space over the field of real numbers. For example, and are elements of and can be added to form . An example of scalar multiplication is that times is . These definitions of addition and scalar multiplication together satisfy the axioms of a vector space, thereby making into a vector space over . (The reason why is called a polynomial ring rather than a polynomial vector space is because it has additional structure — it is a special kind of a vector space — in the form of a multiplication operator which is compatible with the addition and scalar multiplication operators. This is not important for us though.)
Polynomials in two indeterminates, say and , also form a vector space (and indeed, a ring) with respect to the usual operations of polynomial addition and scalar multiplication. This vector space is denoted .
The cross-product of two vector spaces is relatively easy to motivate and understand, so much so that the reader is assumed to be familiar with the cross-product. Recall that if and are vector spaces then the elements of are merely the pairs where and . Recall too that is isomorphic to . In particular, observe that the cross-product is a means of building a new vector space from other vector spaces.
The tensor-product is just like the cross-product in that it too allows one to build a new vector space from other vector spaces. It might be illuminating to think of the other vector spaces as building blocks, and the new vector space as something more complicated built from these simpler building blocks, although of course this need not always be the case. Regardless, what makes such constructions useful is that properties of the new vector space can be deduced from properties of the building block vector spaces. (The simplest example of such a property is the dimension of a vector space; the dimension of is the sum of the dimensions of and .)
Can be obtained from and ? Let’s try the cross-product. The vector space consists of pairs where represents a polynomial in and a polynomial in . This does not appear to work, for how should the polynomial be represented in the form ? (If the ring structure were taken into account then it would be possible to multiply two elements of and it would be seen that, in effect, consists of polynomials that can be factored as , a proper subset of . For this introduction though, all that is important is that is not the cross-product of and .)
With the belief that should be constructible from and , let’s try to figure out what is required to get the job done. Favouring simplicity over sophistication, let’s investigate the problem in terms of basis vectors. The obvious choices of basis vectors are for for and for We are in luck as the pattern is easy to spot: the basis vectors of are just all pairwise products (in the sense of polynomial multiplication) of the basis vectors of with the basis vectors of !
We are therefore motivated to define a construction that takes two vector spaces and and creates a new vector space, denoted , where the new vector space is defined in terms of its basis vectors; roughly speaking, the basis vectors of comprise all formal pairwise products of basis vectors in a particular basis of with basis vectors in a particular basis of . How to do this precisely may not be readily apparent but the fact that is a well-defined vector space that, intuitively at least, is constructible from and gives us hope that such a construction can be made to work. Needless to say, such a construction is possible and is precisely the tensor product.
There are two equivalent ways of defining the tensor product. The first follows immediately from the above description. If and are bases for and then is defined to be the vector space formed from all formal linear combinations of the basis vectors . It is emphasised that is just a symbol, a means for identifying a particular basis vector. (At the risk of belabouring the point, I can form a vector space from the basis vectors Fred and Charlie; typical elements would be 2 Fred + 3 Charlie and 5 Fred – 1 Charlie, and the vector space addition of these two elements would be 7 Fred + 2 Charlie.) Of course, choosing a different set of basis vectors for or would result in a different vector space , however, the resulting vector spaces will always be isomorphic to each other. Therefore, should be thought of as representing any vector space that can be obtained using the above method. (This finer point is not dwelled on here but is a common occurrence in mathematics; a construction can yield a new space which is only unique up to isomorphism. Once tensor products have been understood, the reader is invited to think further about this lack of uniqueness, how it is handled and why it is not an issue in practice.)
The more sophisticated way of defining the tensor product is to consider bilinear maps. Like most things, it is important to understand both methods. The elementary method above provides a basic level of intuition that is easy to grasp but the method is clumsy to work with and harder to generalise. The more sophisticated method is more convenient to work with and adds an extra layer of intuition, but masks the basic level of intuition and therefore actually offers less in the way of intuition to the neophyte.
The motivated reader may now like to:
- Read an introduction to the tensor product on vector spaces that uses basis vectors.
- Read an alternative introduction that uses bilinear maps.
- Work out why the two methods are equivalent.
A hint is offered as to why there is a relationship between “products” and “bilinear maps”, and why the tensor product can be thought of as “the most general product”. Assume we want to define a product on a vector space. That is, let be a vector space and we want to define a function that we think of as multiplication; given , their product is defined to be . What do we mean by multiplication? Presumably, we mean that the laws hold of associativity, distributivity and scalar multiplication, such as and . Merely requiring the scalar multiplication and distributive laws to hold is equivalent to requiring that be bilinear; that is the connection. In general, one can seek to define a multiplication rule between two different vector spaces, which is the level of generality at which the cross-product works. As textbooks will hasten to point out, any bilinear function can be represented by a linear map from to . Personally, I prefer to think of this in the first instance as a bonus result we get for free from the tensor product rather than as the initial motivation for introducing the tensor product; only with the benefit of hindsight are we motivated to define the tensor product using bilinear maps.
Tensor products turn out to be very convenient in other areas too. As just one example, in homological algebra they are a convenient way of changing rings. A special case is converting a vector space over the real numbers into a vector space over the complex numbers; an engineer would not think twice about doing this — anywhere a real number is allowed, allow now a complex number, and carry on as normal! — but how can it be formalised? Quite simply in fact; if is a vector space over the real field then is the complexification of ; although technically is a vector space over the real field according to our earlier definition of tensor product, there is a natural way to treat it as a vector space over the complex field. The reader is invited to contemplate this at leisure, recalling that is a two-dimensional vector space over the real field, with basis .
In conclusion, the tensor product can be motivated by the desire to construct the vector space of polynomials in two indeterminates out of two copies of a vector space of polynomials in only a single indeterminate. Once the construction is achieved, it is found to have a number of interesting properties, including properties relating to bilinear maps. With the benefit of hindsight, it is cleaner to redefine the tensor product in terms of bilinear maps; this broadens its applicability and tidies up the maths, albeit at the expense of possibly making less visible some of the basic intuition about the tensor product.
Learning is ineffective if attempted in a linear fashion. Fine details are best learnt, appreciated and remembered by those that can spontaneously describe and answer questions about the coarser details of the subject at hand. Therefore, it can be valuable for more advanced books to revisit “elementary” concepts because rarely is anything sufficiently elementary that nothing more remains to be known. The following quote is apt; the emphasis is my own. (All quotes below are taken from the preface of C. Lanczos (1966) Discourse on Fourier Series.)
By the nature of things it was necessary to develop the subject from its early beginnings and this explains the fact that even so-called “elementary” concepts, such as the idea of a function, the meaning of limit, uniform convergence and similar “well-known” subjects of analysis were included in the discussion. Far from being bored, the students found this procedure highly appropriate, because very often exactly the apparently “elementary” ideas of mathematics — which are in fact “elementary” only because they are relegated to the undergraduate level of instruction, although their true significance cannot be properly grasped on that level — cause great difficulties in proceeding to the more advanced subjects.
There are three interwoven aspects of mathematical knowledge:
- Intuition — the “pictures” one forms (consciously or subconsciously) in one’s head when reasoning about a problem or endeavouring to generalise a concept.
- Rigour — the formalisation and verification of definitions and proofs.
- Communication — the transfer of mathematical knowledge from one person to another.
Pictures, formulae and discussions are generally how mathematical knowledge is communicated from one person to another. It is important though to isolate this from the actual understanding of mathematics itself. The formula is not in itself what is “understood” by someone reading it. Rather, seeing the formula conjures up a wealth of images in one’s subconscious mind which can be then refined further and reasoned with; seeing the formula primes relevant areas of the cortex facilitating subsequent thought. One “understands” because one can reason with it and answer questions about it, for instance, one can graph it, differentiate it, find its zeros, write down its Taylor series expansion, draw a relevant right-angled triangle and so forth. [Understanding is therefore relative to the questions one has asked oneself or otherwise encountered to date.] Memorising a result does not immediately lead to understanding. Understanding occurs only after one’s mind has formed associations that link the result with other stored knowledge. The degree of understanding is related to the scope and complexity of such associations.
To distinguish rigour from intuition, consider reading the proof of a theorem. It is possible to check a proof is correct without having any sense of actually “understanding” the proof, or even of “understanding” why the theorem should be true. In fact, it is possible to come up with a proof without “understanding” it! That is to say, by trial-and-error and (subconscious) pattern recognition (e.g., making substitutions and transformations one has seen before without quite being sure one is heading in the right direction), one can write down an algebraic proof of a theorem in convex analysis, say, without being able to offer any geometric picture or other explanation to justify how the proof was found. It is often worth the extra effort to develop a sense of intuition about theorems and their proofs. Intuition and rigour together provide a sense of understanding and increase one’s fluency in mathematics.
In some cases, intuition and rigour go hand in hand; one can translate directly one’s intuition into a proof. That this is not always the case is perhaps the only reason why teaching and learning mathematics is non-trivial: It is easy to convey rigour (at least, no harder than programming a computer), and all too easy for an author to convey only rigour and leave it to the readers’ mathematical maturity and ingenuity to deduce the intuition for themselves. Conveying intuition is not necessarily more difficult, but for an author, there are apparently drawbacks.
The most obvious drawback is verbosity. Stating and proving a theorem in its full level of generality (à la Bourbaki) takes considerably less space than does proving a basic theorem using one technique, then motivating the generalisation of the theorem, then pointing out why the proof of the basic theorem does not generalise, then motivating a new proof technique and finally proving the general theorem. Another drawback is imprecision and inaccuracy; intuition need not be precise nor even accurate for it to be valuable, yet some authors may be uncomfortable committing to paper anything even remotely inaccurate.
Tracing the historical development of a subject can provide a wealth of intuition. Here is what Lanczos has to say on the matter.
… a close tie with the historical development seemed appropriate, although the author is well aware that this exposes him to the charge of datedness. We have to be “modern” and there are those who believe that before the advent of our own blessed era the pursuers of mathematics lived in a kind of no-man’s-land, bumping against each other in the gloomy haze that pervaded everything (“Euclid must go!”). But there are others (and the author belongs to the latter group), who believe that the great masters of the eighteenth and nineteenth centuries, Lagrange, Euler, Gauss, Cauchy, Riemann, Fourier, Dirichlet, and many others, were not necessarily lacking mathematical intelligence and some of them might even be comparable to the geniuses of today.
One wonders if occasionally intuition is purposely withheld, the false reasoning being that the merit of an idea is judged by how complicated it is to understand. Merit should be judged by originality, usefulness and simplicity rather than complexity. A thing is understood when it appears to be simple.
To display formal fireworks, which are so much in the centre of many mathematical treatises — perhaps as a status-symbol by which one gains admission to the august guild of mathematicians — was not the primary aim of the book.
In conclusion, when teaching or learning mathematics, keep in mind that intuition and rigour are distinct aspects whose synergy forms mathematical knowledge. Intuition and rigour should be learnt together because rigour without intuition is like owning a car without the key; you can admire the car for its beauty but you cannot get very far with it.