Case Study: Backpropagation from Scratch

1. Introduction: The Algorithm That Runs the World

GPT-4, Stable Diffusion, AlphaGo—they all train using one algorithm: Backpropagation. It is simply the Chain Rule applied to the Computational Graph of a Neural Network.

2. The Network

Consider a tiny network with 1 input, 1 hidden neuron, and 1 output.

Input x.
Hidden h = σ(w₁x + b₁).
Output y = w₂h + b₂.
Loss L = (y - t)².

We want to find &partial;L/&partial;w₁.

3. The Derivation (Chain Rule)

&partial;L/&partial;w₁ = &partial;L/&partial;y · &partial;y/&partial;h · &partial;h/&partial;z₁ · &partial;z₁/&partial;w₁

Loss Gradient: &partial;L/&partial;y = 2(y - t).
Output Weight: &partial;y/&partial;h = w₂.
Activation: &partial;h/&partial;z₁ = σ‘(z₁).
Input Weight: &partial;z₁/&partial;w₁ = x.

Multiply them all together, and you have the gradient!

4. Interactive Visualizer: Neural Flow

A visualization of data flowing Forward (Blue) and Gradients flowing Backward (Red).

Green Edges: Positive Weights.
Red Edges: Negative Weights.
Thickness: Magnitude.

        Target: 1.0 | Output: 0.0 | Loss: 0.0
    

5. Summary

Forward: Compute prediction.
Backward: Compute gradients (Chain Rule).
Update: Adjust weights.
Repeat: Until loss is low.

Next: Module Review →