Home > Informal Classroom Notes > The Tensor Product

The Tensor Product

Various discussions on the internet indicate the concept of tensor product is not always intuitive to grasp on a first reading. Perhaps the reason is it is harder to motivate the concept of tensor product on the vector space \mathbb{R}^n than it is to motivate the concept of a tensor product on a polynomial ring. The following endeavours to give an easily understood pre-introduction to the tensor product. That is to say, the aim is to motivate the standard introductions that are online and in textbooks. Familiarity is assumed with only linear algebra and basic polynomial manipulations (i.e., adding and multiplying polynomials), hence the first step is to introduce just enough formalism to talk about polynomial rings.

By \mathbb{R}[x] is meant the set of all polynomials in the indeterminate x with real-valued coefficients. With the usual definitions of addition and scalar multiplication, \mathbb{R}[x] becomes a vector space over the field \mathbb{R} of real numbers.  For example, x^2 + 2x + 3 and x^3 - 2x are elements of \mathbb{R}[x] and can be added to form x^3 + x^2 + 3. An example of scalar multiplication is that 3 \in \mathbb{R} times x^2 + 2x + 3 \in \mathbb{R}[x] is 3x^2 + 6x + 9 \in \mathbb{R}[x]. These definitions of addition and scalar multiplication together satisfy the axioms of a vector space, thereby making \mathbb{R}[x] into a vector space over \mathbb{R}. (The reason why \mathbb{R}[x] is called a polynomial ring rather than a polynomial vector space is because it has additional structure — it is a special kind of a vector space — in the form of a multiplication operator which is compatible with the addition and scalar multiplication operators. This is not important for us though.)

Polynomials in two indeterminates, say x and y, also form a vector space (and indeed, a ring) with respect to the usual operations of polynomial addition and scalar multiplication.  This vector space is denoted \mathbb{R}[x,y].

The cross-product \times of two vector spaces is relatively easy to motivate and understand, so much so that the reader is assumed to be familiar with the cross-product.  Recall that if U and V are vector spaces then the elements of U \times V are merely the pairs (u,v) where u \in U and v \in V. Recall too that \mathbb{R} \times \mathbb{R} is isomorphic to \mathbb{R}^2. In particular, observe that the cross-product is a means of building a new vector space from other vector spaces.

The tensor-product is just like the cross-product in that it too allows one to build a new vector space from other vector spaces. It might be illuminating to think of the other vector spaces as building blocks, and the new vector space as something more complicated built from these simpler building blocks, although of course this need not always be the case. Regardless, what makes such constructions useful is that properties of the new vector space can be deduced from properties of the building block vector spaces. (The simplest example of such a property is the dimension of a vector space; the dimension of U \times V is the sum of the dimensions of U and V.)

Can \mathbb{R}[x,y] be obtained from \mathbb{R}[x] and \mathbb{R}[y]? Let’s try the cross-product. The vector space \mathbb{R}[x] \times \mathbb{R}[y] consists of pairs (p(x),q(y)) where p(x) represents a polynomial in \mathbb{R}[x] and q(y) a polynomial in \mathbb{R}[y]. This does not appear to work, for how should the polynomial xy^2 + yx^3 + 1 be represented in the form (p(x),q(y))? (If the ring structure were taken into account then it would be possible to multiply two elements of \mathbb{R}[x] \times \mathbb{R}[y] and it would be seen that, in effect, \mathbb{R}[x] \times \mathbb{R}[y] consists of polynomials that can be factored as p(x)q(y), a proper subset of \mathbb{R}[x,y]. For this introduction though, all that is important is that \mathbb{R}[x,y] is not the cross-product of \mathbb{R}[x] and \mathbb{R}[y].)

With the belief that \mathbb{R}[x,y] should be constructible from \mathbb{R}[x] and \mathbb{R}[y], let’s try to figure out what is required to get the job done. Favouring simplicity over sophistication, let’s investigate the problem in terms of basis vectors.  The obvious choices of basis vectors are \{1,x,x^2,\cdots\} for \mathbb{R}[x]; \{1,y,y^2,\cdots\} for \mathbb{R}[y]; and \{1,x,y,x^2,xy,y^2,\cdots\} for \mathbb{R}[x,y]. We are in luck as the pattern is easy to spot: the basis vectors of \mathbb{R}[x,y] are just all pairwise products (in the sense of polynomial multiplication) of the basis vectors of \mathbb{R}[x] with the basis vectors of \mathbb{R}[y]!

We are therefore motivated to define a construction that takes two vector spaces U and V and creates a new vector space, denoted U \otimes V, where the new vector space is defined in terms of its basis vectors; roughly speaking, the basis vectors of U \otimes V comprise all formal pairwise products of basis vectors in a particular basis of U with basis vectors in a particular basis of V. How to do this precisely may not be readily apparent but the fact that \mathbb{R}[x,y] is a well-defined vector space that, intuitively at least, is constructible from \mathbb{R}[x] and \mathbb{R}[y] gives us hope that such a construction can be made to work. Needless to say, such a construction is possible and is precisely the tensor product.

There are two equivalent ways of defining the tensor product. The first follows immediately from the above description. If \{u_1,\cdots,u_i\} and \{v_1,\cdots,v_j\} are bases for U and V then U \otimes V is defined to be the vector space formed from all formal linear combinations of the basis vectors u_1 \otimes v_1, u_1 \otimes v_2, \cdots, u_1 \otimes v_j, u_2 \otimes v_1, \cdots, v_i \otimes v_j. It is emphasised that u_1 \otimes v_1 is just a symbol, a means for identifying a particular basis vector. (At the risk of belabouring the point, I can form a vector space from the basis vectors Fred and Charlie; typical elements would be 2 Fred + 3 Charlie and 5 Fred – 1 Charlie, and the vector space addition of these two elements would be 7 Fred + 2 Charlie.) Of course, choosing a different set of basis vectors for U or V would result in a different vector space U \otimes V, however, the resulting vector spaces will always be isomorphic to each other. Therefore, U \otimes V should be thought of as representing any vector space that can be obtained using the above method. (This finer point is not dwelled on here but is a common occurrence in mathematics; a construction can yield a new space which is only unique up to isomorphism. Once tensor products have been understood, the reader is invited to think further about this lack of uniqueness, how it is handled and why it is not an issue in practice.)

The more sophisticated way of defining the tensor product is to consider bilinear maps. Like most things, it is important to understand both methods. The elementary method above provides a basic level of intuition that is easy to grasp but the method is clumsy to work with and harder to generalise. The more sophisticated method is more convenient to work with and adds an extra layer of intuition, but masks the basic level of intuition and therefore actually offers less in the way of intuition to the neophyte.

The motivated reader may now like to:

  1. Read an introduction to the tensor product on vector spaces that uses basis vectors.
  2. Read an alternative introduction that uses bilinear maps.
  3. Work out why the two methods are equivalent.

A hint is offered as to why there is a relationship between “products” and “bilinear maps”, and why the tensor product can be thought of as “the most general product”. Assume we want to define a product on a vector space.  That is, let V be a vector space and we want to define a function f: V \times V \rightarrow V that we think of as multiplication; given u,v \in V, their product is defined to be f(u,v). What do we mean by multiplication? Presumably, we mean that the laws hold of associativity, distributivity and scalar multiplication, such as f(u,f(v,w)) = f(f(u,v),w) and f(u+v,w) = f(u,w) + f(v,w). Merely requiring the scalar multiplication and distributive laws to hold is equivalent to requiring that f be bilinear; that is the connection. In general, one can seek to define a multiplication rule between two different vector spaces, which is the level of generality at which the cross-product works. As textbooks will hasten to point out, any bilinear function f: U \times V \rightarrow W can be represented by a linear map from U \otimes V to W. Personally, I prefer to think of this in the first instance as a bonus result we get for free from the tensor product rather than as the initial motivation for introducing the tensor product; only with the benefit of hindsight are we motivated to define the tensor product using bilinear maps.

Tensor products turn out to be very convenient in other areas too. As just one example, in homological algebra they are a convenient way of changing rings. A special case is converting a vector space over the real numbers into a vector space over the complex numbers; an engineer would not think twice about doing this — anywhere a real number is allowed, allow now a complex number, and carry on as normal! — but how can it be formalised? Quite simply in fact; if V is a vector space over the real field then \mathbb{C} \otimes V is the complexification of V; although technically \mathbb{C} \otimes V  is a vector space over the real field according to our earlier definition of tensor product, there is a natural way to treat it as a vector space over the complex field. The reader is invited to contemplate this at leisure, recalling that \mathbb{C} is a two-dimensional vector space over the real field, with basis \{1,\sqrt{-1}\}.

In conclusion, the tensor product can be motivated by the desire to construct the vector space of polynomials in two indeterminates out of two copies of a vector space of polynomials in only a single indeterminate. Once the construction is achieved, it is found to have a number of interesting properties, including properties relating to bilinear maps. With the benefit of hindsight, it is cleaner to redefine the tensor product in terms of bilinear maps; this broadens its applicability and tidies up the maths, albeit at the expense of possibly making less visible some of the basic intuition about the tensor product.

  1. salem
    October 17, 2011 at 11:23 am

    i just thought of a general question, if we are studying this for the first time. what is the difference between a tensor and a matrix?

  2. bla
    December 18, 2014 at 4:15 am

    That’s an old post but congratz for your series of posts on tensor products. They’re great and helped me a lot!

  1. October 21, 2011 at 8:41 am
  2. August 2, 2012 at 10:19 am

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: