Playing by Rules: Lagrange Multipliers
[!NOTE] This module explores the core principles of Playing by Rules: Lagrange Multipliers, deriving solutions from first principles and hardware constraints to build world-class, production-ready expertise.
1. Introduction: The Fence
In unconstrained optimization, we just want to find the lowest point $x$ where $\nabla f(x) = 0$. But what if we are not allowed to go everywhere? What if we must stay on a path, or inside a specific region?
- Example: Minimize cost ($f$), but protein intake must be $\ge 50$g ($g$).
- Example: Maximize Entropy ($f$), but probabilities must sum to 1 ($g$).
- Example: Train a Neural Network, but weights must be sparse ($L_1$ regularization).
This is Constrained Optimization.
2. The Intuition: Tangency
Imagine you are hiking up a hill $f(x,y)$. You want to reach the highest point possible, but you are forced to walk along a specific trail defined by $g(x,y)=0$.
- You walk along the trail.
- As long as the trail cuts across the contour lines of the hill, you can keep moving higher.
- If you cross a contour line, it means you are moving from a lower elevation to a higher elevation.
- You stop when the trail runs parallel to the contour lines.
- If you move forward or backward along the trail now, you don’t go up or down. You are at a local maximum on the path.
At this optimal point, the gradient of the hill $\nabla f$ and the gradient of the trail $\nabla g$ point in the same (or opposite) direction! They are parallel.
3. The Lagrange Multiplier ($\lambda$)
Since the gradients are parallel, one must be a scalar multiple of the other:
Here, $\lambda$ is the Lagrange Multiplier.
- It represents the “shadow price” of the constraint.
- If $\lambda$ is large, the constraint is pulling you hard away from the true unconstrained maximum.
- If $\lambda = 0$, the constraint is irrelevant (the peak is already on the path).
The Lagrangian Function
We package this into a single function to minimize:
Finding where $\nabla L = 0$ gives us the optimal point satisfying the constraint.
4. KKT Conditions & Inequality Constraints
For inequality constraints ($g(x) \le 0$), things get slightly more complex. We use the Karush-Kuhn-Tucker (KKT) conditions.
Basically, if the unconstrained minimum is inside the allowed region, the constraint doesn’t matter ($\lambda = 0$). If it’s outside, the solution lies on the boundary ($\lambda > 0$).
Projected Gradient Descent
In Deep Learning (e.g., Adversarial Attacks), we often enforce constraints by:
- Taking a gradient step.
- Projecting the result back into the valid set (e.g., clipping values to [0, 1]).
5. Python: Constrained Optimization with Scipy
We don’t usually write Lagrangian solvers by hand. We use scipy.optimize.
import numpy as np
from scipy.optimize import minimize
def optimization_example():
# Objective: Minimize x^2 + y^2 (Parabola)
fun = lambda x: x[0]**2 + x[1]**2
# Constraint: x + y = 1
# Rewrite as: x + y - 1 = 0
cons = ({'type': 'eq', 'fun': lambda x: x[0] + x[1] - 1})
# Initial guess
x0 = [2.0, 2.0]
res = minimize(fun, x0, constraints=cons)
print(f"Optimal Solution: {res.x}")
# Expected: [0.5, 0.5] (Closest point to origin on line x+y=1)
if __name__ == "__main__":
optimization_example()
6. Interactive Visualizer: The Constrained Climber
Find the optimal point on the Blue Constraint Circle.
- Contour Lines: Represent the hill height (Objective Function $f(x,y) = x+y$). Brighter is higher.
- Red Arrow: Gradient of Objective ($\nabla f$). Always points uphill (North-East).
- Green Arrow: Gradient of Constraint ($\nabla g$). Always points perpendicular to the circle surface.
- Goal: Drag the yellow climber until the Red and Green arrows align perfectly.
7. Summary
- Constraint: A rule that restricts the valid search space.
- Tangency Condition: The optimal point occurs where the constraint boundary runs parallel to the objective’s contour lines.
- Lagrange Multipliers: $\nabla f = \lambda \nabla g$. This equation mathematically captures the tangency condition.
- Shadow Price: $\lambda$ tells us how much the objective $f$ would improve if we relaxed the constraint $g$ by a tiny bit.