Jacobian & Hessian Matrices

1. Introduction: The Landscape of Loss

Training a Neural Network is like hiking down a mountain in the dark. You want to reach the lowest point (Global Minimum Loss). To do this, you need to know:

Which way is down? (Gradient).
Is the ground curving? (Hessian).

2. The Gradient & Jacobian (First Derivative)

The Gradient (∇f) tells you the direction of steepest ascent. You go the opposite way to minimize loss.

If you have a function with multiple outputs (like a layer in a neural net), the derivatives form a matrix called the Jacobian (J).

    Jij = ∂yi / ∂xj

Meaning: How much does Output i change when I wiggle Input j?
Deep Learning: Used in Backpropagation to pass errors backward.

3. The Hessian (Second Derivative)

The Hessian (H) is a matrix of second derivatives. It describes the curvature of the landscape.

    Hij = ∂2f / ∂xi∂xj

The Eigenvalues of the Hessian tell us the shape of the terrain:

All Positive: Bowl (Convex). Local Minimum.
All Negative: Hill (Concave). Local Maximum.
Mixed Signs: Saddle Point. (Up in one direction, down in another).

Newton’s Method (The Smart Jump)

Gradient Descent takes tiny steps. Newton’s Method uses the curvature (Hessian) to take a massive leap straight to the bottom of the bowl.

$x_{new} = x_{old} - H^{-1} \nabla f$

Why don’t we always use it? Calculating the Inverse Hessian ($H^{-1}$) for a billion-parameter network is impossibly expensive ($O(N^3)$).

4. Interactive Visualizer: The Landscape Explorer v3.0

Explore different optimization landscapes.

Gradient Step: Takes a small step downhill. Safe but slow.
Newton Step: Uses curvature to jump. Fast, but can fail if the Hessian is not positive definite (e.g., Saddle Points).

Task: Try to reach the center (0,0) from a random spot. Compare Gradient vs Newton steps on a Saddle Point. Notice how Newton’s Method might shoot you in the wrong direction if the curvature is negative (Hill/Saddle).

Blue=Low, Red=High

                Position (x, y):

                [ 0.0, 0.0 ]
            
Hessian Eigenvalues:

                λ1 = 2.0

                λ2 = 2.0

                    CONVEX (Bowl)

5. Summary

Gradient: Direction of steepest climb. (Use negative gradient to descend).
Jacobian: Matrix of all first-order derivatives. Measures sensitivity.
Hessian: Matrix of second-order derivatives. Measures curvature.
Optimization: We want to find points where Gradient is zero and Hessian is Positive Definite (Bowl).

Next: Application - Neural Network Layers →