Differentiating Matrix Expressions The Easy Way, and an Elementary yet Genuine use for the Tensor Product
In many areas of science requiring differentiating multivariate functions , the derivative is often treated as a vector, and the second-order derivative treated as a matrix. This leads to notation with sometimes appearing and sometimes its transpose appearing. Extending this notation to higher derivatives, or to functions , becomes even more messy.
An alternative is to treat derivatives as (multi-)linear maps. If, at some stage, vectors and matrices are required, i.e., gradients and Hessians, these can be easily read off from the derivatives. But often these are not required. Basically, the difference is working in a particular coordinate system — the gradient and Hessian are only defined with respect to an inner product and that determines the “coordinate system” being used — versus working in a coordinate-free manner.
In Differential Calculus, Tensor Products, and the Importance of Notation, a quick overview is given, but one which points out several subtleties. (For additional examples, see this earlier post.) Furthermore, it introduces the tensor product as a way of simplifying the notation further. This is an elementary yet genuine application benefitting from the tensor product, and is currently the best way I know of introducing tensor products early on to students in a meaningful way. (I am not very pleased with my earlier attempt at an introductory article on the tensor product as I don’t feel it is interesting enough.)