tensor analysis | Jonathan Manton's Blog

Tensors and Matrices

October 21, 2011 jmanton 3 comments

This is a sequel to The Tensor Product in response to a comment posted there. It endeavours to explain the difference between a tensor and a matrix. It also explains why ‘tensors’ were not mentioned in The Tensor Product.

A matrix is a two-dimensional array of numbers (belonging to a field such as $\mathbb{R}$ or $\mathbb{C}$ ) which can be used freely for any purpose, including for organising data collected from an experiment. Nevertheless, there are a number of commonly defined operations involving scalars, vectors and matrices, such as matrix addition, matrix multiplication, matrix-by-vector multiplication and scalar multiplication. These operations allow a matrix to be used to represent a linear map from one vector space to another, and it is this aspect of matrices that is the most relevant here.

Although standard usage means that an $n \times m$ matrix over $\mathbb{R}$ defines a linear map from the vector space $\mathbb{R}^m$ to the vector space $\mathbb{R}^n$ , this is an artefact of $\mathbb{R}^n$ and $\mathbb{R}^m$ implicitly being endowed with a canonical choice of basis vectors. It is perhaps cleaner to start from scratch and observe that a matrix on its own does not define a linear map between two vector spaces: given two two-dimensional vector spaces $V, W$ and the matrix $A = [1,2;3,4]$ (by which I mean the two-by-two matrix whose elements are, from left-to-right top-to-bottom, $1,2,3,4$ ), what linear map from $V$ to $W$ does $A$ represent? By convention, a linear map is represented by a matrix in the following way, with respect to a particular choice of basis vectors for $V$ and for $W$ . Let $v_1, v_2 \in V$ be a basis for $V$ , and $w_1, w_2 \in W$ a basis for $W$ . If $f: V \rightarrow W$ is a linear map then it is fully determined once the values of $f(v_1)$ and $f(v_2)$ are revealed. Furthermore, since $f(v_1)$ is an element of $W$ , it can be written as $f(v_1) = \alpha_{11} w_1 + \alpha_{21} w_2$ for a unique choice of scalars $\alpha_{11}$ and $\alpha_{21}$ . Similarly, $f(v_2) = \alpha_{12} w_1 + \alpha_{22} w_2$ . Knowing the scalars $\alpha_{11}, \alpha_{12}, \alpha_{21}$ and $\alpha_{22}$ , together with knowing the choice of basis vectors $v_1, v_2, w_1$ and $w_2$ , allows one to determine what the linear map $f$ is. (An alternative but essentially equivalent approach would have been to agree that a matrix defines a linear map from $\mathbb{R}^m$ to $\mathbb{R}^n$ , and therefore, to represent a linear map $f: V \rightarrow W$ by a matrix, it is first necessary to choose an isomorphism from $V$ to $\mathbb{R}^m$ and an isomorphism from $W$ to $\mathbb{R}^n$ .)

Unlike a matrix, which can only represent a linear map between vector spaces once a choice of basis vectors has been made, a tensor is a linear (or multi-linear) map. The generalisation from linear to multi-linear maps is a distraction which may lead one to believe the difference between a tensor and a matrix is that a tensor is a generalisation of a matrix to higher dimensions, but this is missing the key point: the machinery of changes of coordinates, which is external to the definition of a matrix as an array of numbers, is internal to the definition of a tensor.

Examples of tensors are linear maps $f: V \rightarrow \mathbb{R}$ and $g: V \rightarrow V$ , and bilinear maps $h: V \times V \rightarrow \mathbb{R}$ . They are tensors because they are multi-linear maps between vector spaces. Choices of basis vectors do not enter the picture until one wishes to describe a particular map $f$ to a friend; unless it is possible to define $f$ in terms of known linear maps such as $\mathrm{trace}$ , it becomes necessary to write down a set of basis vectors and specify the tensor as an array of numbers with respect to this choice of basis vectors. This leads to the traditional definition of tensors, which is still commonly used in physics and engineering.

For convenience and consistency of notation, usually tensors are re-written as multi-linear maps into $\mathbb{R}$ (or whatever the ground field is). Both $f$ and $h$ above are already of this form, but $g$ is not. This is easily rectified; there is a natural equivalence between linear maps $g: V \rightarrow V$ and bilinear maps $\tilde g: V \times V^\ast \rightarrow \mathbb{R}$ where $V^\ast$ is the dual space of $V$ ; recall that elements of $V^\ast$ are simply linear functionals on $V$ , that is, if $\sigma \in V^\ast$ then $\sigma$ is a linear function $\sigma: V \rightarrow \mathbb{R}$ . This equivalence becomes apparent by observing that if, for a fixed $v \in V$ , the values of $(\sigma \circ g)(v) = \sigma(g(v))$ are known for every $\sigma \in V^\ast$ then the value of $g(v)$ is readily deduced, and furthermore, the map taking a $v \in V$ and a $\sigma \in V^\ast$ to $(\sigma \circ g)(v) \in \mathbb{R}$ is bilinear; precisely, the correspondence between $g$ and $\tilde g$ is given by $\tilde g(v,\sigma) = (\sigma \circ g)(v)$ . (If that’s not immediately clear, an intermediate step is the realisation that if the value of $\sigma(w)$ is known for every $\sigma \in V^\ast$ then the value of $w \in V$ is readily determined. Therefore, if we are unhappy about the range of $g$ being $V$ , we can simply use elements of $V^\ast$ to probe the value of $w=g(v)$ .)

An equivalent definition of a tensor is therefore a multi-linear map of the form $T: V \times \cdots \times V \times V^\ast \cdots \times V^\ast \rightarrow \mathbb{R}$ ; see the wikipedia for details. (Linear maps between different vector spaces is a slightly more general concept and is not considered here for simplicity.)

It remains to introduce the tensor product (and to give another definition of tensors, this time in terms of tensor products).

First observe that $\mathcal{T}^p_q$ , the set of all multi-linear maps $T: V^\ast \times \cdots \times V^\ast \times V \times \cdots \times V \rightarrow \mathbb{R}$ where there are $p$ copies of $V^\ast$ and $q$ copies of $V$ , can be made into a vector space in an obvious way; just use pointwise addition and scalar multiplication of the multi-linear maps. Next, observe that $\mathcal{T}^0_1$ and $V^\ast$ are isomorphic. Furthermore, since $V^{\ast\ast}$ , the dual of the dual of $V$ , is naturally isomorphic to $V$ , it is readily seen that $\mathcal{T}^1_0$ is isomorphic to $V$ . Can $\mathcal{T}^p_q$ be constructed from multiple copies of $\mathcal{T}^0_1$ and $\mathcal{T}^1_0$ ?

It turns out that $\mathcal{T}^p_q$ is isomorphic to $\mathcal{T}^1_0 \otimes \cdots \otimes \mathcal{T}^1_0 \otimes \mathcal{T}^0_1 \otimes \cdots \otimes \mathcal{T}^0_1$ where there are $p$ copies of $\mathcal{T}^1_0$ , $q$ copies of $\mathcal{T}^0_1$ and $\otimes$ is the tensor product defined in The Tensor Product. Alternatively, one could have invented the tensor product by examining how $\mathcal{T}^2_0$ can be constructed from two copies of $\mathcal{T}^1_0$ , then observing that the same construction can be repeated and applied to $\mathcal{T}^1_0$ , thereby ‘deriving’ a useful operation denoted by $\otimes$ .

Another equivalent definition of a tensor is therefore an element of a vector space of the form $V \otimes \cdots \otimes V \otimes V^\ast \otimes \cdots \otimes V^\ast$ , and this too is explained in the wikipedia.

To summarise the discussion so far (and restricting attention to the scalar field $\mathbb{R}$ for simplicity):

Matrices do not, on their own, define linear maps between vector spaces (although they do define linear maps between Euclidean spaces $\mathbb{R}^n$ ).
A tensor is a multi-linear map whose domain and range involve zero or more copies of a vector space $V$ and its dual $V^\ast$ .
Any such map can be re-arranged to be of the form $T: V^\ast \times \cdots \times V^\ast \times V \times \cdots \times V \rightarrow \mathbb{R}$ .
The space of such maps (for a fixed number $q$ of copies of $V$ and $p$ copies of $V^\ast$ ) forms a vector space denoted $\mathcal{T}^p_q$ .
It turns out that there exists a single operation $\otimes$ which takes two vector spaces and returns a third, such that, for any $p,q,r,s$ , $\mathcal{T}^p_q \otimes \mathcal{T}^r_s$ is isomorphic to $\mathcal{T}^{p+r}_{q+s}$ . This operation is called the tensor product.
Since $\mathcal{T}^1_0$ and $\mathcal{T}^0_1$ are isomorphic to $V$ and $V^\ast$ respectively, an equivalent definition of a tensor is an element of the vector space $V \otimes \cdots \otimes V \otimes V^\ast \otimes \cdots \otimes V^\ast$ .

Some loose ends are now tidied up. Without additional reading though, certain remaining parts of this article are unlikely to be self-explanatory; the main purpose is to alert the reader what to look out for when learning from textbooks. First, it will be explained how an element of $V \otimes V^\ast$ represents a linear map $h: V \rightarrow V$ . Then an additional usage will be given of the tensor product symbol: the tensor product of two multi-linear maps results in a new multi-linear map. This additional aspect of tensor products was essentially ignored in The Tensor Product. Lastly, an explanation is given of why I omitted any mention of tensors in The Tensor Product.

Let $x \in V \otimes V^\ast$ . The naive way to proceed is as follows. Introduce a basis $\{v_i\}$ for $V$ and $\{\sigma_j\}$ for $V^\ast$ ; different choices will ultimately lead to the same result. Then $x$ can be written as a linear combination $x = \sum_{i,j} \alpha_{ij} v_i \otimes \sigma_j$ where the $\alpha_{ij}$ are scalars. Recall from The Tensor Product that the $v_i \otimes \sigma_j$ are just formal symbols used to distinguish one basis vector of $V \otimes V^\ast$ from another. Here’s the trick; we now associate to each $v_i \otimes \sigma_j$ the linear map $h_{ij}: V \rightarrow V$ that sends $v \in V$ to $\sigma_j(v) v_i$ ; clearly $v \mapsto \sigma_j(v) v_i$ is a linear map from $V$ to $V$ . (This is relatively easy to remember, for how else could we combine $v_i$ with $\sigma_j$ to obtain a linear map from $V$ to $V$ ?) Then, we associate to $x = \sum_{i,j} \alpha_{ij} v_i \otimes \sigma_j$ the linear map $h = \sum_{i,j} \alpha_{ij} h_{ij}$ . It can be verified that this mapping is an isomorphism from the vector space $V \otimes V^\ast$ to the vector space of linear maps from $V$ to $V$ , and moreover, the same mapping results regardless of the original choice of basis vectors for $V$ and $V^\ast$ . While this is useful for actual computations, it does not explain how we knew to use the above trick of sending $v \in V$ to $\sigma_j(v) v_i$ .

A more sophisticated way to proceed uses the universal property characterisation of tensor product and makes it clear why the above construction works. (The universal property characterisation is defined in the wikipedia among other places.) Essentially, under this characterisation, every bilinear map from $V \times V^\ast$ to $\mathbb{R}$ induces a unique linear map from $V \otimes V^\ast$ to $\mathbb{R}$ , and conversely, every linear map from $V \otimes V^\ast$ to $\mathbb{R}$ induces a unique bilinear map from $V \times V^\ast$ to $\mathbb{R}$ . Now, we already know from earlier that linear maps from $V$ to $V$ are equivalent to bilinear maps from $V \times V^\ast$ to $\mathbb{R}$ . As now shown, choosing an element of $V \otimes V^\ast$ is equivalent to choosing a linear map from $V \otimes V^\ast$ to $\mathbb{R}$ . Indeed, by definition of dual, linear maps from $V \times V^\ast$ to $\mathbb{R}$ are precisely the elements of $(V \otimes V^\ast)^\ast \cong V^\ast \otimes V^{\ast\ast} \cong V^\ast \otimes V \cong V \otimes V^\ast$ , as required.

So far, we have only introduced the tensor product of two vector spaces. However, there is a companion operation which takes elements $v \in V$ and $w \in W$ of vector spaces and returns an element $v \otimes w$ of the vector space $V \otimes W$ . (Recall that we have only introduced the formal symbol $v_i \otimes w_j$ to denote a basis vector in the case where $\{v_i\}$ and $\{w_j\}$ are chosen bases for $V$ and $W$ ; no meaning has been given yet to $v \otimes w$ .) For calculations, it suffices to think of $v \otimes w$ as the element obtained by applying formal algebraic laws such as $(u+v) \otimes w = (u \otimes w) + (v \otimes w)$ . Precisely, if $v = \sum_i \alpha_i v_i$ and $w = \sum_j \beta_j w_j$ where $\{v_i\}$ and $\{w_j\}$ are chosen bases for $V$ and $W$ then $v \otimes w$ is defined to be $\sum_{i,j} \alpha_i \beta_j (v_i \otimes w_j)$ , as suggested by the formal manipulations $v \otimes w = (\sum_i \alpha_i v_i) \otimes (\sum_j \beta_j v_j) = \sum_i \alpha_i (v_i \otimes \sum_j \beta_j v_j) = \sum_i \alpha_i \sum_j \beta_j (v_i \otimes v_j)$ . I mention this only because I want to point out that this leads to the following simple rule for computing the tensor product of two multi-linear maps.

Let $S$ and $T$ be multi-linear maps; in fact, for ease of presentation, only the special case $S,T: V \rightarrow \mathbb{R}$ will be considered. Then a multi-linear map can be formed from $S$ and $T$ , namely, $(u,v) \mapsto S(u) T(v)$ . This multi-linear map is denoted $S \otimes T$ and is called the tensor product of $S$ and $T$ , in agreement with the discussion in the previous paragraph.

It is easier to motivate the tensor product $S \otimes T$ of two tensors than it is to motivate the tensor product of two tensor spaces $\mathcal{T}^p_q \otimes \mathcal{T}^r_s$ . Here is an example of such motivation.

What useful multi-linear maps can be formed from the linear maps $S,T: V \rightarrow \mathbb{R}$ ? Playing around, it seems $v \mapsto S(v)+T(v)$ is a linear map; let’s call it $S+T$ . Multiplication does not work because $v \mapsto S(v)T(v)$ is not linear, so $ST$ is not a tensor. The ordinary cross-product would lead to a map $S \times T: V \times V \rightarrow \mathbb{R} \times \mathbb{R}$ which is not of the form of a tensor as stated earlier. What we can do though is form the map $(v,w) \mapsto S(v)T(w)$ . We denote this bilinear map by $S \otimes T: V \times V \rightarrow \mathbb{R}$ . Experience has shown the construction $S \otimes T$ to be useful (which, at the end of the day, is the main justification for introducing new definitions and symbols), although for the moment, the choice of symbol $\otimes$ has not been justified save for it should be different from more common symbols such as $S+T$ , $ST$ and $S \times T$ which mean different things, as observed immediately above. Furthermore, the definition of $\otimes$ as stated above readily extends to general tensors…

One could perhaps use the above paragraph as the start of an introduction to tensor products. In The Tensor Product, I chose instead to ignore tensors completely because, although there is nothing ‘difficult’ about them, the above indicates that there are a multitude of small issues that need to be explained, and at the end of the day, it is not even clear at the outset if there is any use in studying tensor products of tensor spaces! What does the fancy symbol $\otimes$ buy us that we could not have obtained for free just by working directly with multi-linear maps and arrays of numbers? (There are benefits, but they appear in more advanced areas of mathematics and hence are hard to motivate concretely at an elementary level. Of course, one could say the tensor product reduces the study of multi-linear maps to linear maps, that is, back to more familiar territory, but multi-linear maps are not that difficult to work with directly in the first place. )

The Tensor Product motivated the tensor product by wishing to construct $\mathbb{R}[x,y]$ from $\mathbb{R}[x]$ and $\mathbb{R}[y]$ . It was stated there that this allowed properties of $\mathbb{R}[x,y]$ to be deduced from properties of $\mathbb{R}[x]$ . A solid example of this would be localisation of rings: if we have managed to show that the localisation $(\mathbb{R}[x])[1/x]$ is isomorphic to the ring of Laurent polynomials in one variable, then properties of the tensor product allow us to conclude that the localisation $(\mathbb{R}[x,y])[1/xy]$ is isomorphic to the ring of Laurent polynomials in two variables. Even without such examples though, I am more comfortable taking for granted that it is useful to be able to construct $\mathbb{R}[x,y]$ from $\mathbb{R}[x]$ and $\mathbb{R}[y]$ than it is useful to be able to construct $\mathcal{T}^{p+q}_{r+s}$ from $\mathcal{T}^p_r$ and $\mathcal{T}^q_s$ .