Home > Informal Classroom Notes > Tensor Products Introduced as Most General Multiplication (and as Lazy Evaluation)

Tensor Products Introduced as Most General Multiplication (and as Lazy Evaluation)

This note is a fresh attempt at introducing tensor products in a simple and intuitive way. The setting is a vector space V on which we wish to define a multiplication. Importantly, we do not wish to limit ourselves to a multiplication producing an element of V. For example, we would like to think of u^T v and u v^T as two examples of multiplying elements u,v of \mathbb{R}^n. The first multiplication, u,v \mapsto u^T v, is a map from \mathbb{R}^n \times \mathbb{R}^n to \mathbb{R}. The second multiplication, u,v \mapsto uv^T, is a map from \mathbb{R}^n \times \mathbb{R}^n to \mathbb{R}^{n \times n}. In fact, since uv^T is well-defined even if u and v have different dimensions, the most general setting is the following.

Let U and V be fixed vector spaces agreed upon at the start. Assume someone has defined a multiplication rule m\colon U \times V \rightarrow W but has not told us what it is.  Here, m is a function and W is a vector space, both of which are unknown to us. For example, if U = \mathbb{R}^n and V = \mathbb{R}^p, a possible choice might be m(u,v) = uv^T where W = \mathbb{R}^{n \times p}.

What does it mean for m to be a rule for multiplication? We wish it to have certain properties characteristic of multiplication. In a field, multiplication is involved in a number of axioms, including the distributive law x(y+z) = xy+xz. Since U, V and W are vector spaces, potentially distinct from each other, not all the axioms for multiplication in a field make sense in this more general setting. (For instance, commutativity makes no sense if U \neq V.) The outcome is that m is declared to be a rule for multiplication if it is bilinear: m(x,y+z) = m(x,y)+m(x,z), m(x+y,z) = m(x,z)+m(y,z) and m(ax,y) = m(x,ay) = a\,m(x,y).

Not knowing m does not prevent us from working with it; we simply write it as m and are free to manipulate it according to the aforementioned rules. We do the equivalent all the time when we prove general results about a metric; we say “let d(\cdot,\cdot) be a metric” and then make use of only the axioms of a metric, thereby ensuring our derivation is valid regardless of which metric is actually being represented by d.

While the tensor product \otimes is more than this, in the first instance, it suffices to treat it merely as an unspecified rule for multiplication. (It is specified, but no harm comes from forgetting about this in the first instance.) Rather than write m(u,v) it is more convenient to write u \otimes v, and thinking of \otimes as multiplication reminds us that x \otimes (y+z) = x \otimes y + x \otimes z.

The first point is that we can treat \otimes as an unspecified multiplication, simplify expressions by manipulating it in accordance with the rules for multiplication, and at the end of the day, if we are eventually told the rule m, we can evaluate its simplified expression. In computer science parlance, this can be thought of as lazy evaluation.

Whereas there are many metrics “incompatible” with each other, the rules for multiplication are sufficiently rigid that there exists a “most general multiplication”, and all other multiplications are special cases of it, as explained presently. The tensor product \otimes represents this most general multiplication possible.

To consider a specific case first, take U and V to be \mathbb{R}^2. We already know two possible multiplications are u^T v and uv^T. The claim is that the latter represents the most general multiplication possible. This means, among other things, that it must be possible to write u^T v in terms of uv^T, and indeed it is: u^T v = \mathrm{trace}(uv^T). Note that trace is a linear operator. The precise claim is the following. No matter what m\colon \mathbb{R}^2 \times \mathbb{R}^2 \rightarrow W is, we can pretend it is m(u,v) = uv^T, work with this definition, then at the very end, if we are told what m actually is, we can obtain the true answer by applying a particular linear transform. At the risk of belabouring the point, if we wish to simplify m(u+v,w+x) - m(v,x) - m(u,w) then we can first pretend m(u,v) is uv^T to obtain (u+v)(w+x)^T - vx^T - uw^T = ux^T + vw^T. Later when we are told m(u,v) = 2u^Tv we can deduce that the linear map required to convert from uv^T to 2u^Tv is 2u^T v = 2\mathrm{trace}(uv^T). Applying this linear map to ux^T + vw^T yields 2(x^T u + w^Tv) and this is indeed the correct answer because it equals m(u+v,w+x) - m(v,x) - m(u,w).

Readers wishing to think further about the above example are encouraged to consider that the most general multiplication returning a real-valued number is of the form u,v \mapsto v^T Q u for some matrix Q. How can v^T Q u be obtained from uv^T? What about for multiplication rules that return a vector or a matrix?

The general situation works as follows. Given vector spaces U and V, two things can be constructed, both of which use the same symbol \otimes for brevity. We need a space W for our most general multiplication, and we denote this by W = U \otimes V. We also need a rule for multiplication and we denote this by u \otimes v. By definition, u \otimes v is an element of U \otimes V. (Just like not all matrices can be written as uv^T, there are elements in U \otimes V that cannot be written as u \otimes v. They can, however, be written as linear combinations of such terms.)

At this point, I leave the reader to look up elsewhere the definition of a tensor product and relate it with the above.

  1. Royi Avital
    July 27, 2014 at 4:50 pm

    Nice point of view.
    Though it would be great of in the next post you explicitly show it on tensors.

    • August 6, 2014 at 12:37 am

      Good suggestion. I will try to find time to do this in the next few weeks.

  2. Kan Leray
    June 25, 2019 at 1:08 pm

    Nice down-to-earth explanation of universal properties!

  1. No trackbacks yet.

Leave a reply to Royi Avital Cancel reply