Jacobian & Hessian Matrices

1. Introduction: The Landscape of Loss

Training a Neural Network is like hiking down a mountain in the dark. You want to reach the lowest point (Global Minimum Loss). To do this, you need to know:

  1. Which way is down? (Gradient).
  2. Is the ground curving? (Hessian).

2. The Gradient & Jacobian (First Derivative)

The Gradient (∇f) tells you the direction of steepest ascent. You go the opposite way to minimize loss.

If you have a function with multiple outputs (like a layer in a neural net), the derivatives form a matrix called the Jacobian (J).

Jij = ∂yi / ∂xj
  • Meaning: How much does Output i change when I wiggle Input j?
  • Deep Learning: Used in Backpropagation to pass errors backward.

3. The Hessian (Second Derivative)

The Hessian (H) is a matrix of second derivatives. It describes the curvature of the landscape.

Hij = ∂2f / ∂xi∂xj

The Eigenvalues of the Hessian tell us the shape of the terrain:

  • All Positive: Bowl (Convex). Local Minimum.
  • All Negative: Hill (Concave). Local Maximum.
  • Mixed Signs: Saddle Point. (Up in one direction, down in another).

Newton’s Method (The Smart Jump)

Gradient Descent takes tiny steps. Newton’s Method uses the curvature (Hessian) to take a massive leap straight to the bottom of the bowl.

$x_{new} = x_{old} - H^{-1} \nabla f$

Why don’t we always use it? Calculating the Inverse Hessian ($H^{-1}$) for a billion-parameter network is impossibly expensive ($O(N^3)$).


4. Interactive Visualizer: The Landscape Explorer v3.0

Explore different optimization landscapes.

  • Gradient Step: Takes a small step downhill. Safe but slow.
  • Newton Step: Uses curvature to jump. Fast, but can fail if the Hessian is not positive definite (e.g., Saddle Points).

Task: Try to reach the center (0,0) from a random spot. Compare Gradient vs Newton steps on a Saddle Point. Notice how Newton’s Method might shoot you in the wrong direction if the curvature is negative (Hill/Saddle).

Blue=Low, Red=High
Position (x, y):
[ 0.0, 0.0 ]


Hessian Eigenvalues:
λ1 = 2.0
λ2 = 2.0
CONVEX (Bowl)

5. Summary

  • Gradient: Direction of steepest climb. (Use negative gradient to descend).
  • Jacobian: Matrix of all first-order derivatives. Measures sensitivity.
  • Hessian: Matrix of second-order derivatives. Measures curvature.
  • Optimization: We want to find points where Gradient is zero and Hessian is Positive Definite (Bowl).

Next: Application - Neural Network Layers →