This is an attempt to explain the method of Lagrange multipliers without resorting to arguments about tangency of level curves, which I don't find particularly helpful and which also don't work in the case of multiple constraints.
For intuition, take $n=2, i=1$ so that we are maximising $f(x,y)$ subject to $g(x,y) = 0$. The first observation is that $g(x,y)=0$ describes a curve $S$ (specifically a level curve) on the XY-plane. We are walking along this curve and looking for the maximum value of $f$. By the fact of being a level curve of $g(x,y)$, the gradient $\nabla g$ is orthogonal to $S$ at all points on $S$. This is obvious: if the gradient wasn't orthogonal, then we could nudge ourselves along $S$ in the direction of the gradient and get to a higher value of $g$ than where we were before; but the value of $g$ does not change along $S$ by construction.
Now, consider the gradient $\nabla f$ along $S$. By the reasoning we just used, if $\nabla f$ is not orthogonal to $S$, then we can nudge ourselves along $S$ in the direction of the gradient (by "in the direction of the gradient", I mean we could project $\nabla f$ onto the tangent line at our current position on $S$, giving us a vector on that tangent line, then take a step along $S$ in that direction) and reach a higher value of $f$. At the point on $S$ where $f$ is maximal, there is no direction along $S$ where we can move in order to increase $f$, and so $\nabla f$ must be orthogonal to $S$.
At this point, both $\nabla f$ and $\nabla g$ are orthogonal to $S$. We can't say that $\nabla f = \nabla g$ because the gradients might have different magnitudes, or one might point in the opposite direction; but we don't care about the exact direction or the magnitudes of the gradients, we just care about the orthogonality condition. So the sought point $(x_0,y_0)$ satisfies $\nabla f(x_0,y_0) = \lambda \nabla g(x_0,y_0)$ for some constant $\lambda$, and it also satisfies $g(x_0,y_0) = 0$.
Now we will do a trick: we will artificially construct a function $L(x,y,\lambda)$ which, when we equate its gradient to 0, gives us the two conditions above. The first condition is $\lambda f - \lambda g = 0$, so one function that spits out this condition when its gradient is equated to 0 is $L(x,y, \lambda) = f(x,y) - \lambda g(x,y)$. And as a freebie, the third element of the $\nabla L$ vector, when equated to 0, gives the third condition $g(x,y) = 0$ – magic!
So the maximal point of $f(x,y)$ along $g(x,y)=0$ is one of the points where $\nabla L(x,y, \lambda) = 0$. There might be several such points, and some will be local minimums and some will be local maximums, but we can easily evaluate $f$ at each one and pick the maximum.
Multiple constraints
Now suppose we are maximising $f(x,y,z)$ subject to $g_1(x,y,z) = g_2(x,y,z) = 0$.
Things are more interesting now. For one, instead of one level curve we now have two level surfaces for $g_1,g_2$ respectively. Their gradients are still vectors, but they won't generally coincide. The maximum of $f$ doesn't necessarily lie at a point where the gradient of either $g_1$ or $g_2$ is 0.
It's difficult to reason about this setup in the abstract, so let's make this concrete. Picture the level surfaces $g_1=0, g_2=0$ as spheres in space, and imagine the two spheres intersect at the circle $g_1=g_2=0$, like two overlapping soap bubbles. Label the circle $S$.
Consider the gradients of $g_1,g_2$ at a point $p$ on $S$. The gradient of $g_1$ is a vector based at $p$ orthogonal to the surface of the first sphere, and the gradient of $g_2$ is a vector based at $p$ orthogonal to the second sphere. Both gradients are orthogonal to $S$ at $p$, and moreover, the hyperplane that passes through $p$ and intersects both vectors is orthogonal to $S$ at $p$. In visual terms, if we are an ant walking along $S$, the hyperplane looks like as a flat wall at $p$ that impedes our progress. Think hard about this before proceeding.
Since the hyperplane is orthogonal to $S$ at $p$, any vector $v$ based at $p$ lying on this hyperplane is also orthogonal to $S$ at $p$ (but note that it probably won't be orthogonal to either sphere!). The hyperplane is precisely the span of the two vectors, so $v$ is a linear combination of $\nabla g_1, \nabla g_2$.
Now what can we say about the maximality condition on $f$? If we are at a point on $S$ where the gradient of $f$ is not orthogonal to $S$, then once again we can nudge ourselves in the direction of the gradient to increase $f$. But if we are at a point $p$ where $f$ is maximal, then the gradient of $f$ will be orthogonal to $S$ at $p$. In other words, the gradient of $f$ will be some linear combination of $\nabla g_1, \nabla g_2$, since these two vectors span the hyperplane of orthogonal vectors to $S$ at $p$. Concretely, $\nabla f = \sum_i \lambda_i \nabla g_i$ for constants $\lambda_i$
Constructing our artificial function $L$ as before, but satisfying the new conditions we easily see that the right form is $L(x,y,z, \lambda_1, \lambda_2) = f(x,y,z) - \sum_i \lambda_i g_i(x,y,z)$. Equating the gradient $\nabla L$ to 0 gives us exactly the conditions we want: $\nabla f = \sum_i \lambda_i \nabla g_i$, and $g_i=0$ for all $i$.
No comments:
Post a Comment