Home > Informal Classroom Notes > Coordinate-independent Approach to Differentiation

Coordinate-independent Approach to Differentiation

Using partial derivatives is a coordinate-based approach to differentiating functions and can quickly become messy and tedious if matrices are involved. In many cases the following coordinate-independent approach is simpler. This brief note will use the following three examples to demonstrate the elegance of the coordinate-independent approach.
1. $f: \mathbb{R}^{n \times m} \rightarrow \mathbb{R}, f(X) = \mathrm{tr}\{X^T A X\}.$
2. $f: \mathbb{R}^{n \times n} \rightarrow \mathbb{R}, f(X) = \log\det(X).$
3. $f: \mathbb{R}^{n \times m} \rightarrow S^{m \times m}, f(X) = X^T X - I.$

Here, $S^{m \times m} = \{ Y \in \mathbb{R}^{m \times m} \mid Y^T = Y\}$ is the vector space of $m \times m$ symmetric matrices and $I$ denotes the identity matrix.

First some formalities. A convenient level of generalisation is to work with functions between Banach spaces. Recall that a Banach space is a vector space equipped with a norm and which is complete with respect to the norm. (Completeness means every Cauchy sequence converges to a point. Since every finite-dimensional normed vector space is automatically complete, this technical condition can be ignored for the three examples above.)

The vector spaces used above – $\mathbb{R}, \mathbb{R}^{n \times m}, \mathbb{R}^{n \times n}$ and $S^{m \times m}$ – can be made into Banach spaces simply by choosing a norm. Any norm will do because on a finite-dimensional vector space, any two norms are equivalent; if a sequence converges with respect to one norm then it converges with respect to any other norm to the same point. In particular, the derivative will be the same.  (In general, this need not be the case; changing the norm can change the derivative.)

Although there is an important distinction between the Fréchet derivative and the Gâteaux derivative, the trick in practice is simply to aim to calculate the Gâteaux derivative, that is, the directional derivative. Either by showing the directional derivatives fit together in the right way or by appealing to a higher level of reasoning would then allow the Fréchet derivative to be written down in terms of the Gâteaux derivative.  All this will be explained presently.  For the moment, let’s compute the directional derivatives of the above functions.

By definition, the directional derivative of $f$ at $X$ in the direction $Z$ is $g'(0) = \lim_{t \rightarrow 0} \frac{g(t)-g(0)}{t}$ where $g(t) = f(X+tZ).$  Although there are standard rules (chain rule, product rule etc) that can be used to expedite the calculation, it is instructive to proceed from first principles.

Example 1

$f: \mathbb{R}^{n \times m} \rightarrow \mathbb{R}, f(X) = \mathrm{tr}\{X^T A X\}.$

$g(t) - g(0) = \mathrm{tr}\{(X+tZ)^T A (X+tZ)\} - \mathrm{tr}\{X^T A X\}$ $= t\,\mathrm{tr}\{X^T A Z\} + t\,\mathrm{tr}\{Z^T A X\} + t^2\,\mathrm{tr}\{Z^T A Z\}$ from which it follows that $g'(0) = \mathrm{tr}\{X^T A Z\} + \mathrm{tr}\{Z^T A X\}.$ [If $A$ were symmetric, this simplifies to $2 \mathrm{tr}\{X^T A Z\}$.]

Example 2

$f: \mathbb{R}^{n \times n} \rightarrow \mathbb{R}, f(X) = \log\det(X).$

Method 1: $g(t) - g(0) = \log \det(X+tZ) - \log \det X$ $= \log \det( X (I + tX^{-1}Z) ) - \log \det X$ $= \log \det(I + t X^{-1} Z).$ Consider $\det(I + t Y).$ Either directly from the definition of determinant (Leibniz formula), or by writing the determinant recursively using the Laplace formula, it is straightforward to see that $\det(I + t Y) = 1 + t\,\mathrm{tr}\{Y\} + O(t^2).$ Furthermore, the Taylor series for log is $\log(1+x) = x + O(x^2).$ Therefore, $\log \det(I + t X^{-1} Z) = t\,\mathrm{tr}\{X^{-1} Z\} + O(t^2)$ from which it follows that $g'(0) = \mathrm{tr}\{X^{-1} Z\}.$

Method 2: Since the determinant is the product of eigenvalues and the trace is the sum of eigenvalues, it is not surprising that $\log\det A = \mathrm{tr} \log A.$ Therefore $f(X) = \mathrm{tr} \log X.$  By definition, matrix log is defined in terms of its Taylor series, therefore, $\log(X+tZ) = \log(X) + \log(I+tX^{-1}Z)$ $= \log(X) + t X^{-1}Z + O(t^2).$ Thus $\mathrm{tr} \log(X+tZ) - \mathrm{tr} \log X$ $= t\,\mathrm{tr}\{ X^{-1} Z \} + O(t^2)$ from which it follows that the directional derivative is $\mathrm{tr}\{X^{-1} Z\}.$

Example 3

$f: \mathbb{R}^{n \times m} \rightarrow S^{m \times m}, f(X) = X^T X - I.$

$g(t) - g(0) = (X+tZ)^T (X+tZ) - X^T X$ $= t (Z^T X + X^T Z) + t^2 (Z^t Z)$ from which it follows that $g'(0) = Z^T X + X^T Z.$

In all cases, observe that for a fixed $X$, the derivative is a linear function of the direction $Z$. Indeed, the Fréchet derivative is declared to exist at the point $X$ precisely when the directional derivatives are a (continuous and) linear function of the direction $Z$. (Since all three functions above are analytic, the directional derivatives must be linearly related and therefore we could have known in advance that the Fréchet derivative exists.)

If $V$ and $W$ are two Banach spaces then $L(V;W)$, the set of continuous linear functions from $V$ to $W$, can itself be made into a Banach space (with the norm being the operator norm). The Fréchet derivative of a function $f: V \rightarrow W$ is a function $Df : V \rightarrow L(V;W).$ This notation indicates that the directional derivatives fit together linearly. The fact that $L(V;W)$ is itself a Banach space means that $Df$ itself can be differentiated in the same framework.

One common notation is to use a dot to denote the application of the linear operator $Df(X)$ applied to the direction $Z$, for example, the Fréchet derivative of the function $f$ in the first example is $Df(X) \cdot Z = \mathrm{tr}\{X^T A Z\} + \mathrm{tr}\{Z^T A X\}.$

A subsequent note will look at higher order derivatives and several rules for calculating Fréchet derivatives.