Introduction to the Legendre Transform
I have not come across an introduction to the Legendre transform which was entirely intuitive and satisfying. The article Making Sense of the Legendre Transform is one such attempt and has a number of attractive features. However, like other attempts, it immediately links the Legendre transform to re-parametrising a function by its derivative, yet this link is not made crystal clear. Below, I present an alternative introduction to the Legendre transform which takes as its starting point the fact that a convex set is uniquely defined by the collection of its supporting hyperplanes.
My purpose is not to be rigorous, but instead, to present just enough facts for the reader to be comfortable with the basic ideas surrounding the Legendre transform and to tune the reader’s receptiveness to other pedagogic material on Legendre transforms.
The Legendre Transform
Readers are invited to draw their favourite convex function on a piece of paper. Treating the graph of
as the boundary of a cup, we can fill the inside of it. The resulting shape (the boundary plus the inside) is called the epigraph of the convex function. (Precisely, the epigraph of
is
.) Observe that the epigraph of a convex function is a convex set.
A line (or more generally, a hyperplane, if we were working in a higher dimension) is called a supporting hyperplane of a convex set in if it intersects the convex set and the convex set is contained on just one side of it. (This implies that the points of intersection are boundary points.) For example, the horizontal line
is a supporting hyperplane for the epigraph of
. Going further, given a gradient
, we can draw the line
on the same graph as
, and we observe that if
is a large negative number then the line does not intersect
(or its epigraph) at all, and as
is increased, there comes a time when
first touches
. This value of
makes
a supporting hyperplane for the epigraph of
. As
increases further, there will be points of the epigraph of
on either side of the line
. For the function
, it is seen that if
is negative, there are no corresponding supporting hyperplanes. In general, given a convex function
, for each slope
there is at most one value of
making
a supporting hyperplane of the epigraph of
. Furthermore, the set of values of
for which a supporting hyperplane exists forms a convex set.
Recall that in convex analysis, it is a very useful fact that a convex set is defined by its supporting hyperplanes. Therefore, if we know all the supporting hyperplanes of (the epigraph of) then we should be able to reconstruct
. Shortly, we will endeavour to do this from first principles.
Perhaps the simplest way of “storing” what the supporting hyperplanes of are, is to graph
versus
. Indeed, for each value of
, there is at most one value of
for which
is a supporting hyperplane, hence plotting
on the horizontal axis and
on the vertical axis produces the graph of a function. In fact, it can be proved that this graph will always be concave if
is convex. To visualise this, pick a point
and draw a tangent line to the graph
at the point
. This is a supporting hyperplane with
equal to
and
equal to the
-intercept of the tangent line, namely
. Now, as
goes from
to
, it can be seen that
monotonically increases and
initially increases and then decreases. Furthermore, observe that plotting the points
corresponding to the supporting hyperplanes is the same as plotting the points
which is the same as plotting the graph
of the function
.
Plotting versus
will produce a convex function if
is convex, and this is the convention which has proved expedient. Clearly, there is no material difference between plotting
or
as a function of
.
Formally, we have just seen that to any convex function we can associate a function
such that
is a supporting hyperplane of (the epigraph of)
if and only if
(where the negative sign is just a matter of convention). The function
thus defined is called the Legendre transform of
. As mentioned earlier,
is not necessarily defined on the whole of
but only on a convex subset of
. To overcome this minor notational inconvenience, it is customary to set
equal to infinity for values of
for which it would otherwise be undefined. This is a standard trick in convex analysis for avoiding the need to keep track explicitly of the set on which a convex function is defined.
Can we think of another way of storing the set of supporting hyperplanes of ? Each hyperplane
is represented by the pair of coefficients
. If we had chosen to fix
instead, we would have found that depending on the value of
, there could be multiple values of
for which
is a supporting hyperplane. It seems sensible then to represent the set of all points
corresponding to supporting hyperplanes by using the aforementioned function
. Therefore, I choose to interpret the Legendre transform as what one would arrive at if charged with the task of writing down in a nice and simple way what the supporting hyperplanes are of the epigraph of a convex function
.
As promised earlier, let’s see how a function can be recovered from its Legendre transform
. Recall that for each
(for which
is finite), the line
is a supporting hyperplane. In particular, we know that at least one point of the graph of
lies on this line, and we know that every single point of the graph of
lies on or above this line. By drawing all the supporting hyperplanes, we can see intuitively that they therefore trace out
. In fact, readers familiar with envelopes of curves will see that this is the same idea here, and for those not familiar, clicking on the link will bring up the Wikipedia entry with a nice figure showing how the supporting hyperplanes trace out the function.
For simplicity, consider the special case when is differentiable. (A convex function is differentiable at all but at most a countable number of points, and even at such points, left and right derivatives still exist.) Choose an
. How can we find an
such that the point
on the supporting hyperplane
belongs to the graph of
? If we perturb
, we get another supporting hyperplane
. This perturbed hyperplane, as we will call it, intersects with the original hyperplane
at the point with
coordinate given by
which approaches
, the derivative of
, as
. For positive
, the perturbed hyperplane
will lie above the original hyperplane whenever
is larger than the point of intersection. Similarly, if
is negative, the perturbed hyperplane will lie above the original hyperplane whenever
is smaller than the point of intersection. Therefore, the point of intersection in the limit
is precisely the point which also lies on the graph of
. (If this is not clear, re-read the third sentence of the preceding paragraph.) Thus, the point
lies on the graph of
for all
for which
is defined and differentiable. (If
is not differentiable at
, we would have a range of valid
values, namely, those values lying between the left derivative and the right derivative of
at
. We would therefore obtain a line segment belonging to the graph of
. For the moment though, this level of detail is a distraction.)
For completeness — and because something unexpected will reveal itself — let’s give a formula for the graph of if we are given only
, under the simplifying assumption that
is differentiable. As hinted at earlier, it can be shown that the supporting hyperplanes are the same as the tangent lines of
. The tangent line of
at the point
is simply
where
and
. The
-intercept is thus
, and hence the point
lies on the graph of
. Comparing this with the previous paragraph, we see excitedly that going from
to the graph of
has exactly the same form as going from
to the graph of
, and indeed, it can be established rigorously that if
is the Legendre transform of a convex function
, then
is the Legendre transform of
. The Legendre transform is its own inverse.
We close this section by deriving the standard formula for the Legendre transform. Let be an arbitrary function. (It need not even be convex.) Recall the idea given earlier about computing the Legendre transform, namely, we draw the line
for
a very small number and gradually increase
until the line first intersects the graph of
. That is to say, we want the smallest value (or, if the smallest value does not exist, the infimum) of
for which
intersects the graph of
. Therefore, it is equally valid to look for the set of all values of
for which
intersects the graph of
, then choose the infimum of this set. This set is easily found by fixing
and looking in turn at each point
on the graph of
and seeing for what value of
the line
passes through this point. The line passing through
with gradient
is simply
and its
-intercept is thus
. The infimum of
for which
intersects the graph of
is therefore
. The Legendre transform is defined to be the negative of this, by convention, therefore, we have arrived at the mathematical definition of the Legendre transform:
.
The Legendre Transform in Physics
Often the Legendre transform is introduced by saying that it allows a function to be re-parametrised by its derivative, meaning precisely: Given a slope
, first find the value of
such that
then return the value of
at this point. This is written concisely as
, where
is now considered a function of
. This is useful in physics; see for example Making Sense of the Legendre Transform.
Starting from first principles, assume we are given a differentiable and strictly convex function and we are given a slope
, and we wish to write down an explicit expression for the function implicitly defined by the requirement that
is mapped to
where
satisfies
. With ideas from the previous section fresh in our minds, we know that we can find the point
for which
by finding the supporting hyperplane
with slope
for the epigraph of
, and seeing where the supporting hyperplane intersects the graph of
. If
is the Legendre transform of
, then we know the hyperplane is
. Therefore, finding
is equivalent to finding
. The (possibly) good news is that the latter expression involves the function
and not its derivative, but the bad news is that this expression is still an implicit one for
.
The most important observation — and one of the reasons for including the derivations in the previous section — is that provided the derivative exists, we have that satisfies
. (Recall the discussion about how the graph of
can be recovered by thinking about what perturbed hyperplanes tell us.) Therefore, it is not the Legendre transform but its derivative which allows us to write down an explicit expression for the value of
for which
. Furthermore, note from the previous section that
can be written in terms of
alone, as
.
Although this section opened by considering how to write down an explicit expression for , and the Legendre transform was found to be a valuable stepping-stone, the fact of the matter is that
is not necessarily well-defined unless the derivative of
exists and is injective, whereas the Legendre transform is always well-defined and has useful properties. Therefore, although we may have originally thought we wanted an expression for
as an intermediate step in achieving an ultimate objective (e.g., reformulating a physical law in a more convenient way — see Making Sense of the Legendre Transform), it is mathematically advantageous to work with the Legendre transform instead. In the “worst case” we will need to require that the Legendre transform
of
is differentiable and write
as
(or
), but it may be possible that an alternative manipulation can be found for achieving the ultimate objective which uses
directly and does not require
to exist. That is to say, there is nothing to be lost and a chance of something to be gained by working with
rather than
simply because the latter can be obtained from the former in a simple way.
Summary
- Within convex analysis, a convex set is uniquely defined by its supporting hyperplanes, and some properties of a convex set are better elucidated by examining its supporting hyperplanes.
- A function is convex if and only if its epigraph is a convex set.
- The Legendre transform is what one would come up with if charged with the task of describing the set of all supporting hyperplanes of the epigraph of a convex function.
- The Legendre transform is its own inverse (when applied to a convex function).
- The Legendre transform is related to the concept of the envelope of a curve.
- If
is the Legendre transform of
then, if
exists, the point
has the property that
.
- The Legendre transform (precisely, its derivative) therefore allows a function
to be re-parametrised by its derivative. This is often the motivation given for the usefulness of the Legendre transform.
- In trying to derive from first principles a re-parametrisation for
in terms of its derivative, the Legendre transform arises naturally, as an intermediate step.
- The Legendre transform (precisely, its derivative) therefore allows a function
I found McCallaugh’s expanation in “Tensor Methods in Statistics” very intuitive. Dual relationship between function and it’s legendre transform is clear from the figure on page 156 http://yaroslavvb.com/upload/legendre-transform.png
Thanks for the clear explanation! I think a lot of people, even researchers who use the Legendre transform routinely, don’t understand this geometric perspective.
I liked your explanation so much I created a bunch of pictures illustrating it graphically (also including the multidimensional version of the transform), and put it on my website here:
http://maze5.net/?page_id=733
Thanks, Nick. Your pictures are very informative and visually appealing.
I was writing my own explanation of the Legendre Transform when I came across this page. Finally, someone gives a simple explanation where the sup operator comes from. Thank you. Btw, here is the article I wrote: http://www.onmyphd.com/?p=legendre.fenchel.transform.
Reblogged this on simplicity.
It’s beautiful! Thank you!
For some reason I can’t quite explicate, this makes me want to search Mamikon’s work (EG with Apostol) for places where some of this might be echoed….