Multivariable Calculus: The Gradient Vector

1. Introduction: Beyond y=f(x)

Real-world problems rarely depend on a single variable.

We need calculus for functions with vector inputs: f(x).

If z = f(x, y) = x² + y², how does z change? It depends on which direction you move!

Partial with respect to x (∂f / ∂x): Treat y as a constant (like slicing the mountain along the East-West axis). Differentiate x.
Partial with respect to y (∂f / ∂y): Treat x as a constant (slicing North-South). Differentiate y.

Example: f(x, y) = 3x²y

If we collect all partial derivatives into a vector, we get the Gradient:

∇f(x, y) = [ ∂f / ∂x, ∂f / ∂y ]

[!TIP] Gradient Descent: To find the minimum (bottom of the valley), we go in the opposite direction of the gradient: -∇f.

In Deep Learning, we deal with layers of neurons, so we need matrices.

If we have a function mapping a vector to a vector (f: Rⁿ → R^m), the first derivative is an m × n matrix called the Jacobian.

Shape: (Output Dim) × (Input Dim).
Usage: Used for Backpropagating errors through a layer. It tells us how every output changes with respect to every input.

If we have a scalar function (L: Rⁿ → R), the second derivative is an n × n symmetric matrix called the Hessian.

H_ij = ∂²L / ∂w_i∂w_j
Shape: (Input Dim) × (Input Dim).
Usage: Determines Curvature (Bowl vs Saddle).
- Positive Definite H: Valley (Minimum). We want to be here.
- Indefinite H: Saddle Point. We want to escape this.
- Newton’s Method uses the inverse of the Hessian to jump to the minimum, but it’s too expensive ($O(N^3)$) for Deep Learning.

The background shows a “Hill” function: z = 4 - (x² + y²).

Interaction:

Move Mouse: The Red Arrow represents the Gradient ∇f. It always points Uphill (towards the center peak).
Click: Spawn a “Ball” that rolls Downhill (opposite to the gradient).
- Unlike simple Gradient Descent, these balls have Momentum (Mass). They accelerate down the slope (v += a), oscillate, and eventually settle due to friction.

Pos: (0.0, 0.0)

Gradient: [0.0, 0.0]

Click to drop a ball (Physics Enabled)

Partial Derivative: Slope along one axis (Sensitivity to one input).
Gradient Vector: Combined direction of steepest ascent. We move against it to learn.
Jacobian Matrix: The derivatives of a vector output (The Slope Map).
Hessian Matrix: The derivatives of derivatives (The Curvature Map).