An Introduction to Lagrange Multipliers

by Steuard Jensen

Lagrange multipliers are a very useful technique in multivariable calculus, but all too often they are poorly taught and poorly understood. With luck, this overview will help to make the concept and its applications a bit clearer.

Be warned: this page may not be what you're looking for! If you're looking for detailed proofs, I recommend looking in your favorite textbook on multivariable calculus: my focus here is on concepts, not mechanics. (Comes of being a physicist rather than a mathematician, I guess.) If you want to know about Lagrange multipliers in the calculus of variations, as often used in Lagrangian mechanics in physics, this page only discusses them very briefly.

Here's a basic outline of this discussion:


When are Lagrange multipliers useful?

One of the most common problems in calculus is that of finding maxima or minima (in general, "extrema") of a function, but it is often difficult to find a closed form for the function being extremized. Such difficulties often arise when one wishes to maximize or minimize a function subject to fixed outside conditions or constraints. The method of Lagrange multipliers is a powerful tool for solving this class of problems without the need to explicitly solve the conditions and use them to eliminate extra variables.

Put more simply, it's usually not enough to ask, "How do I minimize the aluminum needed to make this can?" (The answer to that is clearly "Make a really, really small can!") You need to ask, "How do I minimize the aluminum while making sure the can will hold 10 ounces of soup?" Or similarly, "How do I maximize my factory's profit given that I only have $15,000 to invest?" Or to take a more sophisticated example, "How quickly will the roller coaster reach the ground assuming it stays on the track?" In general, Lagrange multipliers are useful when some of the variables in the simplest description of a problem are made redundant by the constraints.

A classic example: the "milkmaid problem"

To give a specific, intuitive illustration of this kind of problem, we will consider a classic example which I believe is known as the "Milkmaid problem". It can be phrased as follows:

A picture of a milkmaid and a cow near a river.

It's milking time at the farm, and the milkmaid has been sent to the field to get the day's milk. She's in a hurry to get back for a date with a handsome young goatherd, so she wants to finish her job as quickly as possible. However, before she can gather the milk, she has to rinse out her bucket in the nearby river.

Just when she reaches point M, our heroine spots the cow, way down at point C. Because she is in a hurry, she wants to take the shortest possible path from where she is to the river and then to the cow. If the near bank of the river is a curve satisfying the function g(x,y) = 0, what is the shortest path for the milkmaid to take? (To keep things simple, we assume that the field is flat and uniform and that all points on the river bank are equally good.)

To put this into more mathematical terms, the milkmaid wants to find the point P for which the distance d(M,P) from M to P plus the distance d(P,C) from P to C is a minimum (we assume that the field is flat, so a straight line is the shortest distance between two points). It's not quite this simple, however: if that's the whole problem, then we could just choose P = M (or P = C, or for that matter P anywhere on the line between M and C): we have to impose the constraint that P is a point on the riverbank. Formally, we must minimize the function f(P) = d(M,P) + d(P, C), subject to the constraint that g(P) = 0.

Graphical inspiration for the method

Our first way of thinking about this problem can be obtained directly from the picture itself. We'll use an obscure fact from geometry: for every point P on a given ellipse, the total distance from one focus of the ellipse to P and then to the other focus is exactly the same. (You don't need to know where this fact comes from to understand the example! But you can see it work for yourself by drawing a near-perfect ellipse with the help of two nails, a pencil, and a loop of string.)

In our problem, that means that the milkmaid could get to the cow by way of any point on a given ellipse in the same amount of time: the ellipses are curves of constant f(P). Therefore, to find the desired point P on the riverbank, we must simply find the smallest ellipse that intersects the curve of the river. Just to be clear, only the "constant f(P)" property is really important; the fact that these curves are ellipses is just a lucky convenience (ellipses are easy to draw). The same idea will work no matter what shape the curves happen to be.

The same picture, but with ellipses around the
    milkmaid and cow.

The image at right shows a sequence of ellipses of larger and larger size whose foci are M and C, ending with the one that is just tangent to the riverbank. This is a very significant word! It is obvious from the picture that the "perfect" ellipse and the river are truly tangential to each other at the ideal point P. More mathematically, this means that the normal vector to the ellipse is in the same direction as the normal vector to the riverbank. A few minutes' thought about pictures like this will convince you that this fact is not specific to this problem: it is a general property whenever you have constraints. And that is the insight that leads us to the method of Lagrange multipliers.

The mathematics of Lagrange multipliers

In multivariable calculus, the gradient of a function h is a normal vector to a curve (in two dimensions) or a surface (in higher dimensions) on which h is constant: n = grad(h(P)). The length of the normal vector doesn't matter: any constant multiple of grad(h(P)) is also a normal vector. In our case, we have two functions whose normal vectors are parallel, so

grad(f(P)) = λ grad(g(P)).

The unknown constant multiplier λ is necessary because the magnitudes of the two gradients may be different. (Remember, all we know is that their directions are the same.)

In D dimensions, we now have D+1 equations in D+1 unknowns. D of the unknowns are the coordinates of P (e.g. x, y, and z for D = 3), and the other is the new unknown constant λ. The equation for the gradients derived above is a vector equation, so it provides D equations of constraint. I once got stuck on an exam at this point: don't let it happen to you! The original constraint equation g(P) = 0 is the final equation in the system. Thus, in general, a unique solution exists.

As in many maximum/minimum problems, cases do exist with multiple solutions. There can even be an infinite number of solutions if the constraints are particularly degenerate: imagine if the milkmaid and the cow were both already standing right at the bank of a straight river, for example. In many cases, the actual value of the Lagrange multiplier isn't interesting, but there are some situations in which it can give useful information (as discussed below).

That's it: that's all there is to Lagrange multipliers. Just set the gradient of the function you want to extremize equal to the gradient of the constraint function. If you have more than one constraint, it turns out that all you need to do is to replace the right hand side of the equation with the sum of the gradients of each constraint function, each with its own (different!) Lagrange multiplier. It's a little harder to justify this graphically, however, and it simply doesn't apply in two dimensions. (Each constraint "uses up" one dimension, so two independent constraints in two dimensions already specifies a point uniquely, no matter what function you're trying to extremize.)

A formal mathematical inspiration

There is, however, another way to think of Lagrange multipliers that may be more helpful in some situations and that can provide a better way to remember the details of the technique, especially with multiple constraints. Once again, we start with a function f(P) that we wish to extremize, subject to the condition that g(P) = 0. Now, the usual way in which we extremize a function in multivariable calculus is to set grad(f(P)) = 0. How can we put this condition together with the constraint that we have?

One answer is to add a new variable λ to the problem, and to define a new function to extremize:

F(P, λ) = f(P) - λ g(P).

(Some references call this F "the Lagrangian function". I am not familiar with that usage, although it must be related to the somewhat similar "Lagrangian" used in advanced physics.)

We set grad(F(P, λ)) = 0, but keep in mind that the gradient is now D + 1 dimensional: one of its components is a partial derivative with respect to λ. If you set this new component of the gradient equal to zero, you get the constraint equation g(P) = 0. Meanwhile, the old components of the gradient treat λ as a constant, so it just pulls through. Thus, the other D equations are precisely the D equations found in the graphical approach above.

As presented here, this is just a trick to help you reconstruct the equations you need. However, for those who go on to use Lagrange multipliers in the calculus of variations, this is generally the most useful approach. I suspect that it is in fact very fundamental; my comments immediately below are a step toward exploring it in more depth, but I have never spent the time to work out the details.

The meaning of the multiplier

As a final note, I'll say a few words about what the Lagrange multiplier "means". (Of course, it might seem a bit silly to talk about the "meaning" of an artificial variable added for computational convenience, but bear with me.) In the more formal approach described in the previous section, the constraint function g(P) can be thought of as "competing" with the desired function f(P) to "pull" the point P to its minimum or maximum. The Lagrange multiplier λ can be thought of as a measure of how hard g(P) has to pull in order to make those "forces" balance out on the constraint surface.

This analogy is inspired by the physics of potential energy. In physics involving Lagrange multipliers in the calculus of variations, described below, this analogy turns out to be literally true: there, λ is the force of constraint. (The Lagrange multiplier λ has meaning in economics as well: if you're maximizing profit subject to a limited resource, λ is that resource's marginal value. I haven't studied this topic from the economic perspective myself, but some might appreciate a discussion from that point of view.)


Examples of Lagrange multipliers in action

A box of minimal surface area

What shape should a rectangular box with a specific volume (in three dimensions) be in order to minimize its surface area? (Questions like this are very important for businesses that want to save money on packing materials.) Some people may be able to guess the answer intuitively, but we can prove it using Lagrange multipliers.

Let the lengths of the box's edges be x, y, and z. Then the constraint of constant volume is simply g(x,y,z) = xyz - V = 0, and the function to minimize is f(x,y,z) = 2(xy+xz+yz). The method is straightforward to apply:

2<y+z, x+z, x+y> = grad(f(x,y,z)) = λ grad(g(x,y,z)) = λ <yz, xz, xy>.

(The angle bracket notation <a,b,c> is my favorite way to denote a vector.) Now just solve those three equations; the solution is x = y = z = 4/λ. We could eliminate λ from the problem by using xyz = V, but we don't need to: it is already clear that the optimal shape is a cube.


The closest approach of a line to a point

This example isn't the perfect illustration of where Lagrange multiples are useful, since it is fairly easy to solve without them and not all that convenient to solve with them. But it's a very simple idea, and because of a dumb mistake on my part it was the first example that I applied the technique to. Here's the story...

When I first took multivariable calculus (and before we learned about Lagrange multipliers), my teacher showed the example of finding the point P = <x,y> on a line (y = m x + b) that was closest to a given point Q = <x0,y0>. The function to minimize is of course

d(P,Q) = sqrt[(x-x0)² + (y-y0)²].

(Here, "sqrt" means "square root", of course; that's hard to draw in plain text.)

The teacher went through the problem on the board in the most direct way (I'll explain it later), but it was taking him a while and I was a little bored, so I idly started working the problem myself while he talked. I just leapt right in and set grad(d(x,y)) = 0, so

<x-x0, y-y0> / sqrt[(x-x0)² + (y-y0)²] = <0,0>,

and thus x = x0 and y = y0. My mistake here is obvious, so I won't blame you for having a laugh at my expense: I forgot to impose the constraint that <x,y> be on the line! (In my defense, I wasn't really focusing on what I was doing, since I was listening to lecture at the same time.) I felt a little silly, but I didn't think much more about it.

Happily, we learned about Lagrange multipliers the very next week, and I immediately saw that my mistake had been a perfect introduction to the technique. We write the equation of the line as g(x,y) = y - m x - b = 0, so grad(g(x,y)) = <-m,1>. So we just set the two gradients equal (up to the usual factor of λ), giving

<x-x0, y-y0> / sqrt[(x-x0)² + (y-y0)²] = λ<-m,1>.

The second component of this equation is just an equation for λ, so we can substitute it into the first equation. The denominators are the same and cancel, leaving just (x-x0) = -m(y-y0). Finally, we substitute y = m x+b, giving x-x0 = -m² x - m b + m y0, so we come to the final answer: x = (x0 + m y0 - m b) / (m² + 1). (And thus y = (m x0 + m² y0 + b)/(m² + 1).)

So what did my teacher actually do? He used the equation of the line to substitute y for x in d(P,Q), which left us with an "easy" single-variable function to deal with... but a rather complicated one:

d(P,Q) = sqrt[(x - x0)² + (m x + b - y0)²]

To solve the problem from this point, you take the derivative and set it equal to zero as usual. It's a bit of a pain, since the function is a mess, but the answer is x = (x0 + m y0 - m b)/(m² + 1). That's exactly what we got earlier, so both methods seem to work. In this case, the second method may be a little faster (though I didn't show all of the work), but in more complicated problems Lagrange multipliers are often much easier than the direct approach.


Lagrange multipliers in the calculus of variations (often in physics)

This section will be brief, in part because most readers have probably never heard of the calculus of variations. Many people first see this idea in advanced physics classes that cover Lagrangian mechanics, and that will be the perspective taken here (in particular, I will use variable names inspired by physics). If you don't already know the basics of this subject (specifically, the Euler-Lagrange equations), you'll probably want to just skip this section.

The calculus of variations is essentially an extension of calculus to the case where the basic variables are not simple numbers xi (which can be thought of as a position) but functions xi(t) (which in physics corresponds to a position that changes in time). Rather than seeking the numbers xi that extremize a function f(xi), we seek the functions xi(t) that extremize the integral (dt) of a function L[xi(t), xi'(t), t], where xi'(t) are the derivatives of xi(t). (The reason we have to integrate first is to get an ordinary number out: we know what "maximum" and "minimum" mean for numbers, but there could be any number of definitions of those concepts for functions.) In most cases, we integrate between fixed values t0 and t1, and we hold the values xi(t0) and xi(t1) fixed. (In physics, that means that the initial and final positions are held constant, and we're interested finding the "best" path to get between them; L defines what we mean by "best".)

The solutions to this problem can be shown to satisfy the Euler-Lagrange equations (I have suppressed the "(t)" in the functions xi(t):

dL/dxi - d/dt ( dL/dxi' ) = 0.

(I apologize for being unclear above, but I have restricted myself to characters available in simple HTML on this page. The derivative d/dt is a total derivative, while the derivatives with respect to xi and xi' are "partials", at least formally.)

Imposing constraints on this process is often essential. In physics, it is common for an object to be constrained on some track or surface, or for various coordinates to be related (like position and angle when a wheel rolls without slipping). To do this, we follow a simple generalization of the procedure we used in ordinary calculus. First, we write the constraint as a function set equal to zero: g(xi, t) = 0. (Constraints that necessarily involve the derivatives of xi often cannot be solved.) And second, we add a term to the function L that is multiplied by a new function λ(t): Lλ[xixi', λ, t] = L[xixi', t] + λ(t) g(xi, t).

From here, we proceed exactly as you would expect: λ(t) is treated as another coordinate function, just as λ was treated as an additional coordinate in ordinary calculus. The Euler-Lagrange equations are then written as

dL/dxi - d/dt (dL/dxi') + λ(t) (dg/dxi) = 0.

This can be generalized to the case of multiple constraints precisely as before, by introducing additional Lagrange multiplier functions like λ.

As mentioned in the calculus section, the meaning of the Lagrange multiplier function in this case is surprisingly well-defined and can even be useful. It turns out that Qi = λ(t) (dg/dx i) is precisely the force required to impose the constraint g(xi, t) (in the "direction" of xi). Thus, for example, Lagrange multipliers can be used to calculate the force you would feel while riding a roller coaster, or the force of friction required to keep a wheel rolling without slipping. If you want this information, Lagrange multipliers are one of the best ways to get it.


And now you're on your own

This page certainly isn't a complete explanation of Lagrange multipliers, but I hope that it has at least clarified the basic idea a little bit. I'm always glad to hear constructive criticism and positive feedback, so feel free to write to me with your comments. (My thanks to the many people whose comments have already helped me to improve this presentation.) I hope that I have helped to make this extremely useful technique make more sense. Best wishes using Lagrange multipliers in the future!


Up to my tutorial page.
Up to my teaching page.
Up to my professional page.

Any questions or comments? Write to me: sjensen@jsd.claremont.edu
Copyright © 2004-7 by Steuard Jensen.
Thanks to Adam Marshall, Stan Brown, and many others for helpful suggestions!

This page is designed to be printer friendly as it stands (to the extent that web pages ever are: every browser has its own printing bugs), so with luck no separate print version should be required. Try it!