### Archive

Posts Tagged ‘intuition behind tensor product’

## Tensor Products Introduced as Most General Multiplication (and as Lazy Evaluation)

This note is a fresh attempt at introducing tensor products in a simple and intuitive way. The setting is a vector space $V$ on which we wish to define a multiplication. Importantly, we do not wish to limit ourselves to a multiplication producing an element of $V$. For example, we would like to think of $u^T v$ and $u v^T$ as two examples of multiplying elements $u,v$ of $\mathbb{R}^n$. The first multiplication, $u,v \mapsto u^T v$, is a map from $\mathbb{R}^n \times \mathbb{R}^n$ to $\mathbb{R}$. The second multiplication, $u,v \mapsto uv^T$, is a map from $\mathbb{R}^n \times \mathbb{R}^n$ to $\mathbb{R}^{n \times n}$. In fact, since $uv^T$ is well-defined even if $u$ and $v$ have different dimensions, the most general setting is the following.

Let $U$ and $V$ be fixed vector spaces agreed upon at the start. Assume someone has defined a multiplication rule $m\colon U \times V \rightarrow W$ but has not told us what it is.  Here, $m$ is a function and $W$ is a vector space, both of which are unknown to us. For example, if $U = \mathbb{R}^n$ and $V = \mathbb{R}^p$, a possible choice might be $m(u,v) = uv^T$ where $W = \mathbb{R}^{n \times p}$.

What does it mean for $m$ to be a rule for multiplication? We wish it to have certain properties characteristic of multiplication. In a field, multiplication is involved in a number of axioms, including the distributive law $x(y+z) = xy+xz$. Since $U, V$ and $W$ are vector spaces, potentially distinct from each other, not all the axioms for multiplication in a field make sense in this more general setting. (For instance, commutativity makes no sense if $U \neq V$.) The outcome is that $m$ is declared to be a rule for multiplication if it is bilinear: $m(x,y+z) = m(x,y)+m(x,z)$, $m(x+y,z) = m(x,z)+m(y,z)$ and $m(ax,y) = m(x,ay) = a\,m(x,y)$.

Not knowing $m$ does not prevent us from working with it; we simply write it as $m$ and are free to manipulate it according to the aforementioned rules. We do the equivalent all the time when we prove general results about a metric; we say “let $d(\cdot,\cdot)$ be a metric” and then make use of only the axioms of a metric, thereby ensuring our derivation is valid regardless of which metric is actually being represented by $d$.

While the tensor product $\otimes$ is more than this, in the first instance, it suffices to treat it merely as an unspecified rule for multiplication. (It is specified, but no harm comes from forgetting about this in the first instance.) Rather than write $m(u,v)$ it is more convenient to write $u \otimes v$, and thinking of $\otimes$ as multiplication reminds us that $x \otimes (y+z) = x \otimes y + x \otimes z$.

The first point is that we can treat $\otimes$ as an unspecified multiplication, simplify expressions by manipulating it in accordance with the rules for multiplication, and at the end of the day, if we are eventually told the rule $m$, we can evaluate its simplified expression. In computer science parlance, this can be thought of as lazy evaluation.

Whereas there are many metrics “incompatible” with each other, the rules for multiplication are sufficiently rigid that there exists a “most general multiplication”, and all other multiplications are special cases of it, as explained presently. The tensor product $\otimes$ represents this most general multiplication possible.

To consider a specific case first, take $U$ and $V$ to be $\mathbb{R}^2$. We already know two possible multiplications are $u^T v$ and $uv^T$. The claim is that the latter represents the most general multiplication possible. This means, among other things, that it must be possible to write $u^T v$ in terms of $uv^T$, and indeed it is: $u^T v = \mathrm{trace}(uv^T)$. Note that trace is a linear operator. The precise claim is the following. No matter what $m\colon \mathbb{R}^2 \times \mathbb{R}^2 \rightarrow W$ is, we can pretend it is $m(u,v) = uv^T$, work with this definition, then at the very end, if we are told what $m$ actually is, we can obtain the true answer by applying a particular linear transform. At the risk of belabouring the point, if we wish to simplify $m(u+v,w+x) - m(v,x) - m(u,w)$ then we can first pretend $m(u,v)$ is $uv^T$ to obtain $(u+v)(w+x)^T - vx^T - uw^T = ux^T + vw^T$. Later when we are told $m(u,v) = 2u^Tv$ we can deduce that the linear map required to convert from $uv^T$ to $2u^Tv$ is $2u^T v = 2\mathrm{trace}(uv^T)$. Applying this linear map to $ux^T + vw^T$ yields $2(x^T u + w^Tv)$ and this is indeed the correct answer because it equals $m(u+v,w+x) - m(v,x) - m(u,w)$.

Readers wishing to think further about the above example are encouraged to consider that the most general multiplication returning a real-valued number is of the form $u,v \mapsto v^T Q u$ for some matrix $Q$. How can $v^T Q u$ be obtained from $uv^T$? What about for multiplication rules that return a vector or a matrix?

The general situation works as follows. Given vector spaces $U$ and $V$, two things can be constructed, both of which use the same symbol $\otimes$ for brevity. We need a space $W$ for our most general multiplication, and we denote this by $W = U \otimes V$. We also need a rule for multiplication and we denote this by $u \otimes v$. By definition, $u \otimes v$ is an element of $U \otimes V$. (Just like not all matrices can be written as $uv^T$, there are elements in $U \otimes V$ that cannot be written as $u \otimes v$. They can, however, be written as linear combinations of such terms.)

At this point, I leave the reader to look up elsewhere the definition of a tensor product and relate it with the above.