Home > Informal Classroom Notes > Introduction to the Legendre Transform

Introduction to the Legendre Transform

I have not come across an introduction to the Legendre transform which was entirely intuitive and satisfying.  The article Making Sense of the Legendre Transform is one such attempt and has a number of attractive features.  However, like other attempts, it immediately links the Legendre transform to re-parametrising a function by its derivative, yet this link is not made crystal clear. Below, I present an alternative introduction to the Legendre transform which takes as its starting point the fact that a convex set is uniquely defined by the collection of its supporting hyperplanes.

My purpose is not to be rigorous, but instead, to present just enough facts for the reader to be comfortable with the basic ideas surrounding the Legendre transform and to tune the reader’s receptiveness to other pedagogic material on Legendre transforms.

The Legendre Transform

Readers are invited to draw their favourite convex function f: \mathbb{R} \rightarrow \mathbb{R} on a piece of paper.  Treating the graph of f as the boundary of a cup, we can fill the inside of it.  The resulting shape (the boundary plus the inside) is called the epigraph of the convex function.  (Precisely, the epigraph of f is \{(x,z) \mid z \geq f(x)\}.)  Observe that the epigraph of a convex function is a convex set.

A line (or more generally, a hyperplane, if we were working in a higher dimension) is called a supporting hyperplane of a convex set in \mathbb{R}^2 if it intersects the convex set and the convex set is contained on just one side of it. (This implies that the points of intersection are boundary points.) For example, the horizontal line y=0 is a supporting hyperplane for the epigraph of y=x^2.  Going further, given a gradient m, we can draw the line y=mx+c on the same graph as y = x^2, and we observe that if c is a large negative number then the line does not intersect y=x^2 (or its epigraph) at all, and as c is increased, there comes a time when y=mx+c first touches y=x^2. This value of c makes y=mx+c a supporting hyperplane for the epigraph of y=x^2. As c increases further, there will be points of the epigraph of y=x^2 on either side of the line y=mx+c. For the function y=e^x, it is seen that if m is negative, there are no corresponding supporting hyperplanes. In general, given a convex function f: \mathbb{R} \rightarrow \mathbb{R}, for each slope m there is at most one value of c making y=mx+c a supporting hyperplane of the epigraph of f. Furthermore, the set of values of m for which a supporting hyperplane exists forms a convex set.

Recall that in convex analysis, it is a very useful fact that a convex set is defined by its supporting hyperplanes. Therefore, if we know all the supporting hyperplanes of (the epigraph of) f then we should be able to reconstruct f. Shortly, we will endeavour to do this from first principles.

Perhaps the simplest way of “storing” what the supporting hyperplanes of f are, is to graph c versus m. Indeed, for each value of m, there is at most one value of c for which y=mx+c is a supporting hyperplane, hence plotting m on the horizontal axis and c on the vertical axis produces the graph of a function. In fact, it can be proved that this graph will always be concave if f is convex. To visualise this, pick a point x and draw a tangent line to the graph y=x^2 at the point (x,x^2). This is a supporting hyperplane with m equal to \frac{df}{dx}=2x and c equal to the y-intercept of the tangent line, namely -x^2. Now, as x goes from -\infty to \infty, it can be seen that m monotonically increases and c initially increases and then decreases. Furthermore, observe that plotting the points (m,c) corresponding to the supporting hyperplanes is the same as plotting the points \{(2x,-x^2) \mid x \in \mathbb{R}\} which is the same as plotting the graph \{(m,c) \mid c = -\frac{m^2}{4}\} of the function c = -\frac{m^2}{4}.

Plotting -c versus m will produce a convex function if f is convex, and this is the convention which has proved expedient. Clearly, there is no material difference between plotting -c or c as a function of m.

Formally, we have just seen that to any convex function f we can associate a function h such that y=mx+c is a supporting hyperplane of (the epigraph of) f if and only if c = -h(m) (where the negative sign is just a matter of convention). The function h thus defined is called the Legendre transform of f. As mentioned earlier, h is not necessarily defined on the whole of \mathbb{R} but only on a convex subset of \mathbb{R}. To overcome this minor notational inconvenience, it is customary to set h(m) equal to infinity for values of m for which it would otherwise be undefined. This is a standard trick in convex analysis for avoiding the need to keep track explicitly of the set on which a convex function is defined.

Can we think of another way of storing the set of supporting hyperplanes of f? Each hyperplane y=mx+c is represented by the pair of coefficients (m,c).  If we had chosen to fix c instead, we would have found that depending on the value of c, there could be multiple values of m for which y=mx+c is a supporting hyperplane. It seems sensible then to represent the set of all points (m,c) corresponding to supporting hyperplanes by using the aforementioned function h. Therefore, I choose to interpret the Legendre transform as what one would arrive at if charged with the task of writing down in a nice and simple way what the supporting hyperplanes are of the epigraph of a convex function f.

As promised earlier, let’s see how a function f can be recovered from its Legendre transform h. Recall that for each m (for which h(m) is finite), the line y=mx-h(m) is a supporting hyperplane.  In particular, we know that at least one point of the graph of f lies on this line, and we know that every single point of the graph of f lies on or above this line.  By drawing all the supporting hyperplanes, we can see intuitively that they therefore trace out f.  In fact, readers familiar with envelopes of curves will see that this is the same idea here, and for those not familiar, clicking on the link will bring up the Wikipedia entry with a nice figure showing how the supporting hyperplanes trace out the function.

For simplicity, consider the special case when h is differentiable.  (A convex function is differentiable at all but at most a countable number of points, and even at such points, left and right derivatives still exist.) Choose an m.  How can we find an x such that the point (x,mx-h(m)) on the supporting hyperplane y = mx-h(m) belongs to the graph of f?  If we perturb m, we get another supporting hyperplane y = (m+\epsilon)x-h(m+\epsilon). This perturbed hyperplane, as we will call it, intersects with the original hyperplane y = mx-h(m) at the point with x coordinate given by x = \frac{1}{\epsilon}\left[h(m+\epsilon)-h(m)\right] which approaches h'(m), the derivative of h, as \epsilon \rightarrow 0. For positive \epsilon, the perturbed hyperplane y = (m+\epsilon)x-h(m+\epsilon) will lie above the original hyperplane whenever x is larger than the point of intersection.  Similarly, if \epsilon is negative, the perturbed hyperplane will lie above the original hyperplane whenever x is smaller than the point of intersection.  Therefore, the point of intersection in the limit \epsilon \rightarrow 0 is precisely the point which also lies on the graph of f. (If this is not clear, re-read the third sentence of the preceding paragraph.) Thus, the point (h'(m),mh'(m)-h(m)) lies on the graph of f for all m for which h is defined and differentiable. (If h is not differentiable at m, we would have a range of valid x values, namely, those values lying between the left derivative and the right derivative of h at m. We would therefore obtain a line segment belonging to the graph of f. For the moment though, this level of detail is a distraction.)

For completeness — and because something unexpected will reveal itself — let’s give a formula for the graph of h if we are given only f, under the simplifying assumption that f is differentiable. As hinted at earlier, it can be shown that the supporting hyperplanes are the same as the tangent lines of f. The tangent line of f at the point (x_0,f(x_0)) is simply y - y_0 = m(x-x_0) where m=f'(x_0) and y_0 = f(x_0).  The y-intercept is thus -mx_0+y_0, and hence the point (m,mx_0-y_0) = (f'(x_0),x_0f'(x_0)-f(x_0)) lies on the graph of h. Comparing this with the previous paragraph, we see excitedly that going from f to the graph of h has exactly the same form as going from h to the graph of f, and indeed, it can be established rigorously that if h is the Legendre transform of a convex function f, then f is the Legendre transform of h. The Legendre transform is its own inverse.

We close this section by deriving the standard formula for the Legendre transform. Let f be an arbitrary function. (It need not even be convex.) Recall the idea given earlier about computing the Legendre transform, namely, we draw the line y=mx+c for c a very small number and gradually increase c until the line first intersects the graph of f.  That is to say, we want the smallest value (or, if the smallest value does not exist, the infimum) of c for which y=mx+c intersects the graph of f. Therefore, it is equally valid to look for the set of all values of c for which y=mx+c intersects the graph of f, then choose the infimum of this set. This set is easily found by fixing m and looking in turn at each point (x_0,f(x_0)) on the graph of f and seeing for what value of c the line y=mx+c passes through this point. The line passing through (x_0,f(x_0)) with gradient m is simply y-f(x_0)=m(x-x_0) and its y-intercept is thus -mx_0+f(x_0). The infimum of c for which y=mx+c intersects the graph of f is therefore \inf_{x_0} -mx_0+f(x_0).  The Legendre transform is defined to be the negative of this, by convention, therefore, we have arrived at the mathematical definition of the Legendre transform: h(m) = \sup_x mx-f(x).

The Legendre Transform in Physics

Often the Legendre transform is introduced by saying that it allows a function f(x) to be re-parametrised by its derivative, meaning precisely: Given a slope m, first find the value of x such that f'(x)=m then return the value of f at this point.  This is written concisely as f(x(m)), where x is now considered a function of m. This is useful in physics; see for example Making Sense of the Legendre Transform.

Starting from first principles, assume we are given a differentiable and strictly convex function f: \mathbb{R} \rightarrow \mathbb{R} and we are given a slope m, and we wish to write down an explicit expression for the function implicitly defined by the requirement that m is mapped to f(x) where x satisfies f'(x)=m. With ideas from the previous section fresh in our minds, we know that we can find the point x for which f'(x)=m by finding the supporting hyperplane y=mx+c with slope m for the epigraph of f, and seeing where the supporting hyperplane intersects the graph of f. If h is the Legendre transform of f, then we know the hyperplane is y = mx-h(m).  Therefore, finding \{x \mid f'(x)=m\} is equivalent to finding \{x \mid mx-h(m)=f(x)\}. The (possibly) good news is that the latter expression involves the function f and not its derivative, but the bad news is that this expression is still an implicit one for x.

The most important observation — and one of the reasons for including the derivations in the previous section — is that provided the derivative exists, we have that x = h'(m) satisfies f'(x)=m. (Recall the discussion about how the graph of f can be recovered by thinking about what perturbed hyperplanes tell us.)  Therefore, it is not the Legendre transform but its derivative which allows us to write down an explicit expression for the value of x for which f'(x)=m. Furthermore, note from the previous section that f(h'(m)) can be written in terms of h alone, as mh'(m)-h(m).

Although this section opened by considering how to write down an explicit expression for f(x(m)), and the Legendre transform was found to be a valuable stepping-stone, the fact of the matter is that f(x(m)) is not necessarily well-defined unless the derivative of f exists and is injective, whereas the Legendre transform is always well-defined and has useful properties. Therefore, although we may have originally thought we wanted an expression for f(x(m)) as an intermediate step in achieving an ultimate objective (e.g., reformulating a physical law in a more convenient way — see Making Sense of the Legendre Transform), it is mathematically advantageous to work with the Legendre transform instead.  In the “worst case” we will need to require that the Legendre transform h of f is differentiable and write f(x(m)) as f(h'(m)) (or mh'(m)-h(m)), but it may be possible that an alternative manipulation can be found for achieving the ultimate objective which uses h directly and does not require h'(m) to exist. That is to say, there is nothing to be lost and a chance of something to be gained by working with h rather than f(x(m)) simply because the latter can be obtained from the former in a simple way.

Summary

  • Within convex analysis, a convex set is uniquely defined by its supporting hyperplanes, and some properties of a convex set are better elucidated by examining its supporting hyperplanes.
  • A function is convex if and only if its epigraph is a convex set.
  • The Legendre transform is what one would come up with if charged with the task of describing the set of all supporting hyperplanes of the epigraph of a convex function.
  • The Legendre transform is its own inverse (when applied to a convex function).
  • The Legendre transform is related to the concept of the envelope of a curve.
  • If h is the Legendre transform of f then, if h'(m) exists, the point x=h'(m) has the property that f'(x)=m.
    • The Legendre transform (precisely, its derivative) therefore allows a function f to be re-parametrised by its derivative. This is often the motivation given for the usefulness of the Legendre transform.
    • In trying to derive from first principles a re-parametrisation for f in terms of its derivative, the Legendre transform arises naturally, as an intermediate step.
  1. November 23, 2010 at 7:49 am

    I found McCallaugh’s expanation in “Tensor Methods in Statistics” very intuitive. Dual relationship between function and it’s legendre transform is clear from the figure on page 156 http://yaroslavvb.com/upload/legendre-transform.png

  2. Nick Alger
    January 11, 2011 at 4:54 pm

    Thanks for the clear explanation! I think a lot of people, even researchers who use the Legendre transform routinely, don’t understand this geometric perspective.

  3. January 26, 2011 at 11:44 am

    I liked your explanation so much I created a bunch of pictures illustrating it graphically (also including the multidimensional version of the transform), and put it on my website here:
    http://maze5.net/?page_id=733

    • January 26, 2011 at 6:25 pm

      Thanks, Nick. Your pictures are very informative and visually appealing.

  4. May 19, 2013 at 9:41 am

    I was writing my own explanation of the Legendre Transform when I came across this page. Finally, someone gives a simple explanation where the sup operator comes from. Thank you. Btw, here is the article I wrote: http://www.onmyphd.com/?p=legendre.fenchel.transform.

  5. March 18, 2014 at 12:38 am

    Reblogged this on simplicity.

  6. Freya
    October 2, 2018 at 5:00 pm

    It’s beautiful! Thank you!

  7. Ross Mohan
    January 28, 2020 at 1:48 pm

    For some reason I can’t quite explicate, this makes me want to search Mamikon’s work (EG with Apostol) for places where some of this might be echoed….

  1. May 16, 2011 at 1:25 pm
  2. May 12, 2017 at 4:04 am
  3. July 1, 2017 at 1:49 am
  4. December 22, 2018 at 11:14 am

Leave a comment