Case Study: Backpropagation from Scratch

1. Introduction: The Algorithm That Runs the World

GPT-4, Stable Diffusion, AlphaGo—they all train using one algorithm: Backpropagation. It is simply the Chain Rule applied to the Computational Graph of a Neural Network.


2. The Network

Consider a tiny network with 1 input, 1 hidden neuron, and 1 output.

  • Input x.
  • Hidden h = σ(w1x + b1).
  • Output y = w2h + b2.
  • Loss L = (y - t)2.

We want to find &partial;L/&partial;w1.


3. The Derivation (Chain Rule)

&partial;L/&partial;w1 = &partial;L/&partial;y · &partial;y/&partial;h · &partial;h/&partial;z1 · &partial;z1/&partial;w1
  1. Loss Gradient: &partial;L/&partial;y = 2(y - t).
  2. Output Weight: &partial;y/&partial;h = w2.
  3. Activation: &partial;h/&partial;z1 = σ‘(z1).
  4. Input Weight: &partial;z1/&partial;w1 = x.

Multiply them all together, and you have the gradient!


4. Interactive Visualizer: Neural Flow

A visualization of data flowing Forward (Blue) and Gradients flowing Backward (Red).

  • Green Edges: Positive Weights.
  • Red Edges: Negative Weights.
  • Thickness: Magnitude.
Target: 1.0 | Output: 0.0 | Loss: 0.0

5. Summary

  • Forward: Compute prediction.
  • Backward: Compute gradients (Chain Rule).
  • Update: Adjust weights.
  • Repeat: Until loss is low.

Next: Module Review →