Wednesday, July 28, 2021

Intuition for Lagrange multipliers with multiple constraints

This is an attempt to explain the method of Lagrange multipliers without resorting to arguments about tangency of level curves, which I don't find particularly helpful and which also don't work in the case of multiple constraints.

Task: maximise $f : \mathbb R^n \to \mathbb R$ subject to a number of constraints $g_i = 0$ for each $g_i : \mathbb R^n \to \mathbb R$.

For intuition, take $n=2, i=1$ so that we are maximising $f(x,y)$ subject to $g(x,y) = 0$. The first observation is that $g(x,y)=0$ describes a curve $S$ (specifically a level curve) on the XY-plane. We are walking along this curve and looking for the maximum value of $f$. By the fact of being a level curve of $g(x,y)$, the gradient $\nabla g$ is orthogonal to $S$ at all points on $S$. This is obvious: if the gradient wasn't orthogonal, then we could nudge ourselves along $S$ in the direction of the gradient and get to a higher value of $g$ than where we were before; but the value of $g$ does not change along $S$ by construction.

Now, consider the gradient $\nabla f$ along $S$. By the reasoning we just used, if $\nabla f$ is not orthogonal to $S$, then we can nudge ourselves along $S$ in the direction of the gradient (by "in the direction of the gradient", I mean we could project $\nabla f$ onto the tangent line at our current position on $S$, giving us a vector on that tangent line, then take a step along $S$ in that direction) and reach a higher value of $f$. At the point on $S$ where $f$ is maximal, there is no direction along $S$ where we can move in order to increase $f$, and so $\nabla f$ must be orthogonal to $S$.

At this point, both $\nabla f$ and $\nabla g$ are orthogonal to $S$. We can't say that $\nabla f = \nabla g$ because the gradients might have different magnitudes, or one might point in the opposite direction; but we don't care about the exact direction or the magnitudes of the gradients, we just care about the orthogonality condition. So the sought point $(x_0,y_0)$ satisfies $\nabla f(x_0,y_0) = \lambda \nabla g(x_0,y_0)$ for some constant $\lambda$, and it also satisfies $g(x_0,y_0) = 0$.

Now we will do a trick: we will artificially construct a function $L(x,y,\lambda)$ which, when we equate its gradient to 0, gives us the two conditions above. The first condition is $\lambda f - \lambda g = 0$, so one function that spits out this condition when its gradient is equated to 0 is $L(x,y, \lambda) = f(x,y) - \lambda g(x,y)$. And as a freebie, the third element of the $\nabla L$ vector, when equated to 0, gives the third condition $g(x,y) = 0$ – magic!

So the maximal point of $f(x,y)$ along $g(x,y)=0$ is one of the points where $\nabla L(x,y, \lambda) = 0$. There might be several such points, and some will be local minimums and some will be local maximums, but we can easily evaluate $f$ at each one and pick the maximum.

Multiple constraints


Now suppose we are maximising $f(x,y,z)$ subject to $g_1(x,y,z) = g_2(x,y,z) = 0$.

Things are more interesting now. For one, instead of one level curve we now have two level surfaces for $g_1,g_2$ respectively. Their gradients are still vectors, but they won't generally coincide. The maximum of $f$ doesn't necessarily lie at a point where the gradient of either $g_1$ or $g_2$ is 0.

It's difficult to reason about this setup in the abstract, so let's make this concrete. Picture the level surfaces $g_1=0, g_2=0$ as spheres in space, and imagine the two spheres intersect at the circle $g_1=g_2=0$, like two overlapping soap bubbles. Label the circle $S$.

Consider the gradients of $g_1,g_2$ at a point $p$ on $S$. The gradient of $g_1$ is a vector based at $p$ orthogonal to the surface of the first sphere, and the gradient of $g_2$ is a vector based at $p$ orthogonal to the second sphere. Both gradients are orthogonal to $S$ at $p$, and moreover, the hyperplane that passes through $p$ and intersects both vectors is orthogonal to $S$ at $p$. In visual terms, if we are an ant walking along $S$, the hyperplane looks like as a flat wall at $p$ that impedes our progress. Think hard about this before proceeding.

Since the hyperplane is orthogonal to $S$ at $p$, any vector $v$ based at $p$ lying on this hyperplane is also orthogonal to $S$ at $p$ (but note that it probably won't be orthogonal to either sphere!). The hyperplane is precisely the span of the two vectors, so $v$ is a linear combination of $\nabla g_1, \nabla g_2$.

Now what can we say about the maximality condition on $f$? If we are at a point on $S$ where the gradient of $f$ is not orthogonal to $S$, then once again we can nudge ourselves in the direction of the gradient to increase $f$. But if we are at a point $p$ where $f$ is maximal, then the gradient of $f$ will be orthogonal to $S$ at $p$. In other words, the gradient of $f$ will be some linear combination of $\nabla g_1, \nabla g_2$, since these two vectors span the hyperplane of orthogonal vectors to $S$ at $p$. Concretely, $\nabla f = \sum_i \lambda_i \nabla g_i$ for constants $\lambda_i$

Constructing our artificial function $L$ as before, but satisfying the new conditions we easily see that the right form is $L(x,y,z, \lambda_1, \lambda_2) = f(x,y,z) - \sum_i \lambda_i g_i(x,y,z)$. Equating the gradient $\nabla L$ to 0 gives us exactly the conditions we want: $\nabla f = \sum_i \lambda_i \nabla g_i$, and $g_i=0$ for all $i$.

No comments:

Post a Comment