## Introduction to the Legendre Transform

I have not come across an introduction to the Legendre transform which was entirely intuitive and satisfying. The article Making Sense of the Legendre Transform is one such attempt and has a number of attractive features. However, like other attempts, it immediately links the Legendre transform to re-parametrising a function by its derivative, yet this link is not made crystal clear. Below, I present an alternative introduction to the Legendre transform which takes as its starting point the fact that a convex set is uniquely defined by the collection of its supporting hyperplanes.

My purpose is not to be rigorous, but instead, to present just enough facts for the reader to be comfortable with the basic ideas surrounding the Legendre transform and to tune the reader’s receptiveness to other pedagogic material on Legendre transforms.

### The Legendre Transform

Readers are invited to draw their favourite convex function on a piece of paper. Treating the graph of as the boundary of a cup, we can fill the inside of it. The resulting shape (the boundary plus the inside) is called the epigraph of the convex function. (Precisely, the epigraph of is .) Observe that the epigraph of a convex function is a convex set.

A line (or more generally, a hyperplane, if we were working in a higher dimension) is called a supporting hyperplane of a convex set in if it intersects the convex set and the convex set is contained on just one side of it. (This implies that the points of intersection are boundary points.) For example, the horizontal line is a supporting hyperplane for the epigraph of . Going further, given a gradient , we can draw the line on the same graph as , and we observe that if is a large negative number then the line does not intersect (or its epigraph) at all, and as is increased, there comes a time when first touches . This value of makes a supporting hyperplane for the epigraph of . As increases further, there will be points of the epigraph of on either side of the line . For the function , it is seen that if is negative, there are no corresponding supporting hyperplanes. *In general, given a convex function , for each slope there is at most one value of making a supporting hyperplane of the epigraph of . Furthermore, the set of values of for which a supporting hyperplane exists forms a convex set.*

Recall that in convex analysis, it is a very useful fact that a convex set is defined by its supporting hyperplanes. Therefore, if we know all the supporting hyperplanes of (the epigraph of) then we should be able to reconstruct . Shortly, we will endeavour to do this from first principles.

Perhaps the simplest way of “storing” what the supporting hyperplanes of are, is to graph versus . Indeed, for each value of , there is at most one value of for which is a supporting hyperplane, hence plotting on the horizontal axis and on the vertical axis produces the graph of a function. In fact, it can be proved that this graph will always be concave if is convex. To visualise this, pick a point and draw a tangent line to the graph at the point . This is a supporting hyperplane with equal to and equal to the -intercept of the tangent line, namely . Now, as goes from to , it can be seen that monotonically increases and initially increases and then decreases. Furthermore, observe that plotting the points corresponding to the supporting hyperplanes is the same as plotting the points which is the same as plotting the graph of the function .

Plotting versus will produce a convex function if is convex, and this is the convention which has proved expedient. Clearly, there is no material difference between plotting or as a function of .

Formally, we have just seen that to any convex function we can associate a function such that is a supporting hyperplane of (the epigraph of) if and only if (where the negative sign is just a matter of convention). *The function thus defined is called the Legendre transform of . *As mentioned earlier, is not necessarily defined on the whole of but only on a convex subset of . To overcome this minor notational inconvenience, it is customary to set equal to infinity for values of for which it would otherwise be undefined. This is a standard trick in convex analysis for avoiding the need to keep track explicitly of the set on which a convex function is defined.

Can we think of another way of storing the set of supporting hyperplanes of ? Each hyperplane is represented by the pair of coefficients . If we had chosen to fix instead, we would have found that depending on the value of , there could be multiple values of for which is a supporting hyperplane. It seems sensible then to represent the set of all points corresponding to supporting hyperplanes by using the aforementioned function . Therefore, I choose to interpret the Legendre transform as what one would arrive at if charged with the task of writing down in a nice and simple way what the supporting hyperplanes are of the epigraph of a convex function .

As promised earlier, let’s see how a function can be recovered from its Legendre transform . Recall that for each (for which is finite), the line is a supporting hyperplane. In particular, we know that at least one point of the graph of lies on this line, and we know that every single point of the graph of lies on or above this line. By drawing all the supporting hyperplanes, we can see intuitively that they therefore trace out . In fact, readers familiar with envelopes of curves will see that this is the same idea here, and for those not familiar, clicking on the link will bring up the Wikipedia entry with a nice figure showing how the supporting hyperplanes trace out the function.

For simplicity, consider the special case when is differentiable. (A convex function is differentiable at all but at most a countable number of points, and even at such points, left and right derivatives still exist.) Choose an . How can we find an such that the point on the supporting hyperplane belongs to the graph of ? If we perturb , we get another supporting hyperplane . This perturbed hyperplane, as we will call it, intersects with the original hyperplane at the point with coordinate given by which approaches , the derivative of , as . For positive , the perturbed hyperplane will lie above the original hyperplane whenever is larger than the point of intersection. Similarly, if is negative, the perturbed hyperplane will lie above the original hyperplane whenever is smaller than the point of intersection. Therefore, the point of intersection in the limit is precisely the point which also lies on the graph of . (If this is not clear, re-read the third sentence of the preceding paragraph.) Thus, the point lies on the graph of for all for which is defined and differentiable. (If is not differentiable at , we would have a range of valid values, namely, those values lying between the left derivative and the right derivative of at . We would therefore obtain a line segment belonging to the graph of . For the moment though, this level of detail is a distraction.)

For completeness — and because something unexpected will reveal itself — let’s give a formula for the graph of if we are given only , under the simplifying assumption that is differentiable. As hinted at earlier, it can be shown that the supporting hyperplanes are the same as the tangent lines of . The tangent line of at the point is simply where and . The -intercept is thus , and hence the point lies on the graph of . Comparing this with the previous paragraph, we see excitedly that going from to the graph of has exactly the same form as going from to the graph of , and indeed, it can be established rigorously that *if is the Legendre transform of a convex function , then is the Legendre transform of .* The Legendre transform is its own inverse.

We close this section by deriving the standard formula for the Legendre transform. Let be an arbitrary function. (It need not even be convex.) Recall the idea given earlier about computing the Legendre transform, namely, we draw the line for a very small number and gradually increase until the line first intersects the graph of . That is to say, we want the smallest value (or, if the smallest value does not exist, the infimum) of for which intersects the graph of . Therefore, it is equally valid to look for the set of all values of for which intersects the graph of , then choose the infimum of this set. This set is easily found by fixing and looking in turn at each point on the graph of and seeing for what value of the line passes through this point. The line passing through with gradient is simply and its -intercept is thus . The infimum of for which intersects the graph of is therefore . The Legendre transform is defined to be the negative of this, by convention, therefore, we have arrived at the mathematical definition of the Legendre transform: .

### The Legendre Transform in Physics

Often the Legendre transform is introduced by saying that it allows a function to be re-parametrised by its derivative, meaning precisely: Given a slope , first find the value of such that then return the value of at this point. This is written concisely as , where is now considered a function of . This is useful in physics; see for example Making Sense of the Legendre Transform.

Starting from first principles, assume we are given a differentiable and strictly convex function and we are given a slope , and we wish to write down an explicit expression for the function implicitly defined by the requirement that is mapped to where satisfies . With ideas from the previous section fresh in our minds, we know that we can find the point for which by finding the supporting hyperplane with slope for the epigraph of , and seeing where the supporting hyperplane intersects the graph of . If is the Legendre transform of , then we know the hyperplane is . Therefore, finding is equivalent to finding . The (possibly) good news is that the latter expression involves the function and not its derivative, but the bad news is that this expression is still an *implicit* one for .

The most important observation — and one of the reasons for including the derivations in the previous section — is that provided the derivative exists, we have that satisfies . (Recall the discussion about how the graph of can be recovered by thinking about what perturbed hyperplanes tell us.) Therefore, it is not the Legendre transform but its derivative which allows us to write down an explicit expression for the value of for which . Furthermore, note from the previous section that can be written in terms of alone, as .

Although this section opened by considering how to write down an *explicit* expression for , and the Legendre transform was found to be a valuable stepping-stone, the fact of the matter is that is not necessarily well-defined unless the derivative of exists and is injective, whereas the Legendre transform is always well-defined and has useful properties. Therefore, although we may have originally thought we wanted an expression for as an intermediate step in achieving an ultimate objective (e.g., reformulating a physical law in a more convenient way — see Making Sense of the Legendre Transform), it is mathematically advantageous to work with the Legendre transform instead. In the “worst case” we will need to require that the Legendre transform of is differentiable and write as (or ), but it may be possible that an alternative manipulation can be found for achieving the ultimate objective which uses directly and does not require to exist. That is to say, there is nothing to be lost and a chance of something to be gained by working with rather than simply because the latter can be obtained from the former in a simple way.

Summary

- Within convex analysis, a convex set is uniquely defined by its supporting hyperplanes, and some properties of a convex set are better elucidated by examining its supporting hyperplanes.
- A function is convex if and only if its epigraph is a convex set.
- The Legendre transform is what one would come up with if charged with the task of describing the set of all supporting hyperplanes of the epigraph of a convex function.
- The Legendre transform is its own inverse (when applied to a convex function).
- The Legendre transform is related to the concept of the envelope of a curve.
- If is the Legendre transform of then, if exists, the point has the property that .
- The Legendre transform (precisely, its derivative) therefore allows a function to be re-parametrised by its derivative. This is often the motivation given for the usefulness of the Legendre transform.
- In trying to derive from first principles a re-parametrisation for in terms of its derivative, the Legendre transform arises naturally, as an intermediate step.

I found McCallaugh’s expanation in “Tensor Methods in Statistics” very intuitive. Dual relationship between function and it’s legendre transform is clear from the figure on page 156 http://yaroslavvb.com/upload/legendre-transform.png

Thanks for the clear explanation! I think a lot of people, even researchers who use the Legendre transform routinely, don’t understand this geometric perspective.

I liked your explanation so much I created a bunch of pictures illustrating it graphically (also including the multidimensional version of the transform), and put it on my website here:

http://maze5.net/?page_id=733

Thanks, Nick. Your pictures are very informative and visually appealing.

I was writing my own explanation of the Legendre Transform when I came across this page. Finally, someone gives a simple explanation where the sup operator comes from. Thank you. Btw, here is the article I wrote: http://www.onmyphd.com/?p=legendre.fenchel.transform.

Reblogged this on simplicity.

It’s beautiful! Thank you!