Coordinate-independent Approach to Differentiation
Here, is the vector space of symmetric matrices and denotes the identity matrix.
First some formalities. A convenient level of generalisation is to work with functions between Banach spaces. Recall that a Banach space is a vector space equipped with a norm and which is complete with respect to the norm. (Completeness means every Cauchy sequence converges to a point. Since every finite-dimensional normed vector space is automatically complete, this technical condition can be ignored for the three examples above.)
The vector spaces used above – and – can be made into Banach spaces simply by choosing a norm. Any norm will do because on a finite-dimensional vector space, any two norms are equivalent; if a sequence converges with respect to one norm then it converges with respect to any other norm to the same point. In particular, the derivative will be the same. (In general, this need not be the case; changing the norm can change the derivative.)
Although there is an important distinction between the Fréchet derivative and the Gâteaux derivative, the trick in practice is simply to aim to calculate the Gâteaux derivative, that is, the directional derivative. Either by showing the directional derivatives fit together in the right way or by appealing to a higher level of reasoning would then allow the Fréchet derivative to be written down in terms of the Gâteaux derivative. All this will be explained presently. For the moment, let’s compute the directional derivatives of the above functions.
By definition, the directional derivative of at in the direction is where Although there are standard rules (chain rule, product rule etc) that can be used to expedite the calculation, it is instructive to proceed from first principles.
from which it follows that [If were symmetric, this simplifies to .]
Method 1: Consider Either directly from the definition of determinant (Leibniz formula), or by writing the determinant recursively using the Laplace formula, it is straightforward to see that Furthermore, the Taylor series for log is Therefore, from which it follows that
Method 2: Since the determinant is the product of eigenvalues and the trace is the sum of eigenvalues, it is not surprising that Therefore By definition, matrix log is defined in terms of its Taylor series, therefore, Thus from which it follows that the directional derivative is
from which it follows that
In all cases, observe that for a fixed , the derivative is a linear function of the direction . Indeed, the Fréchet derivative is declared to exist at the point precisely when the directional derivatives are a (continuous and) linear function of the direction . (Since all three functions above are analytic, the directional derivatives must be linearly related and therefore we could have known in advance that the Fréchet derivative exists.)
If and are two Banach spaces then , the set of continuous linear functions from to , can itself be made into a Banach space (with the norm being the operator norm). The Fréchet derivative of a function is a function This notation indicates that the directional derivatives fit together linearly. The fact that is itself a Banach space means that itself can be differentiated in the same framework.
One common notation is to use a dot to denote the application of the linear operator applied to the direction , for example, the Fréchet derivative of the function in the first example is
A subsequent note will look at higher order derivatives and several rules for calculating Fréchet derivatives.