Playing by Rules: Lagrange Multipliers

[!NOTE] This module explores the core principles of Playing by Rules: Lagrange Multipliers, deriving solutions from first principles and hardware constraints to build world-class, production-ready expertise.

1. Introduction: The Fence

In unconstrained optimization, we just want to find the lowest point $x$ where $\nabla f(x) = 0$. But what if we are not allowed to go everywhere? What if we must stay on a path, or inside a specific region?

Example: Minimize cost ($f$), but protein intake must be $\ge 50$g ($g$).
Example: Maximize Entropy ($f$), but probabilities must sum to 1 ($g$).
Example: Train a Neural Network, but weights must be sparse ($L_1$ regularization).

This is Constrained Optimization.

2. The Intuition: Tangency

Imagine you are hiking up a hill $f(x,y)$. You want to reach the highest point possible, but you are forced to walk along a specific trail defined by $g(x,y)=0$.

You walk along the trail.
As long as the trail cuts across the contour lines of the hill, you can keep moving higher.
- If you cross a contour line, it means you are moving from a lower elevation to a higher elevation.
You stop when the trail runs parallel to the contour lines.
- If you move forward or backward along the trail now, you don’t go up or down. You are at a local maximum on the path.

At this optimal point, the gradient of the hill $\nabla f$ and the gradient of the trail $\nabla g$ point in the same (or opposite) direction! They are parallel.

3. The Lagrange Multiplier ($\lambda$)

Since the gradients are parallel, one must be a scalar multiple of the other:

$$ \nabla f = \lambda \nabla g $$

Here, $\lambda$ is the Lagrange Multiplier.

It represents the “shadow price” of the constraint.
If $\lambda$ is large, the constraint is pulling you hard away from the true unconstrained maximum.
If $\lambda = 0$, the constraint is irrelevant (the peak is already on the path).

The Lagrangian Function

We package this into a single function to minimize:

$$ L(x, y, \lambda) = f(x, y) - \lambda(g(x, y) - c) $$

Finding where $\nabla L = 0$ gives us the optimal point satisfying the constraint.

4. KKT Conditions & Inequality Constraints

For inequality constraints ($g(x) \le 0$), things get slightly more complex. We use the Karush-Kuhn-Tucker (KKT) conditions.

Basically, if the unconstrained minimum is inside the allowed region, the constraint doesn’t matter ($\lambda = 0$). If it’s outside, the solution lies on the boundary ($\lambda > 0$).

Projected Gradient Descent

In Deep Learning (e.g., Adversarial Attacks), we often enforce constraints by:

Taking a gradient step.
Projecting the result back into the valid set (e.g., clipping values to [0, 1]).

5. Python: Constrained Optimization with Scipy

We don’t usually write Lagrangian solvers by hand. We use scipy.optimize.

import numpy as np
from scipy.optimize import minimize

def optimization_example():
    # Objective: Minimize x^2 + y^2 (Parabola)
    fun = lambda x: x[0]**2 + x[1]**2

    # Constraint: x + y = 1
    # Rewrite as: x + y - 1 = 0
    cons = ({'type': 'eq', 'fun': lambda x: x[0] + x[1] - 1})

    # Initial guess
    x0 = [2.0, 2.0]

    res = minimize(fun, x0, constraints=cons)

    print(f"Optimal Solution: {res.x}")
    # Expected: [0.5, 0.5] (Closest point to origin on line x+y=1)

if __name__ == "__main__":
    optimization_example()

6. Interactive Visualizer: The Constrained Climber

Find the optimal point on the Blue Constraint Circle.

Contour Lines: Represent the hill height (Objective Function $f(x,y) = x+y$). Brighter is higher.
Red Arrow: Gradient of Objective ($\nabla f$). Always points uphill (North-East).
Green Arrow: Gradient of Constraint ($\nabla g$). Always points perpendicular to the circle surface.
Goal: Drag the yellow climber until the Red and Green arrows align perfectly.

Optimal Alignment! (∇f || ∇g)

Click or Drag along the Blue Circle to move the climber.

7. Summary

Constraint: A rule that restricts the valid search space.
Tangency Condition: The optimal point occurs where the constraint boundary runs parallel to the objective’s contour lines.
Lagrange Multipliers: $\nabla f = \lambda \nabla g$. This equation mathematically captures the tangency condition.
Shadow Price: $\lambda$ tells us how much the objective $f$ would improve if we relaxed the constraint $g$ by a tiny bit.

Playing by Rules: Lagrange Multipliers

Playing by Rules: Lagrange Multipliers

1. Introduction: The Fence

2. The Intuition: Tangency

3. The Lagrange Multiplier ($\lambda$)

The Lagrangian Function

4. KKT Conditions & Inequality Constraints

Projected Gradient Descent

5. Python: Constrained Optimization with Scipy

6. Interactive Visualizer: The Constrained Climber

7. Summary

Found this lesson helpful?