Automatic Differentiation: The Magic of PyTorch

1. Introduction: Who computes the gradients?

In calculus class, you calculated derivatives by hand. In early AI (80s), people derived gradients on paper and coded them. In Modern AI (PyTorch/TF), you write the Forward Pass, and the framework calculates the Backward Pass (gradients) automatically. This is AutoDiff.


2. The Computational Graph

Every calculation in your code builds a graph. Nodes = Operations (+, -, *, sin). Edges = Data Flow (Tensors).

Example: y = (x + 2) * 3

  1. Input x.
  2. Add 2 → a.
  3. Multiply 3 → y.

3. Forward vs Backward Mode

  • Forward Mode: Computes the value (y) and the derivative (dy/dx) simultaneously. Good when inputs < outputs.
  • Backward Mode (Backprop): Computes value first, then traverses the graph in reverse to find gradients. Good when inputs > outputs (like in Neural Nets, where inputs=millions, output=1 loss).

4. Interactive Visualizer: Graph Builder

Visualize the computational graph for y = (x + w) * b.

  1. Forward Pass (Blue): Values flow up.
  2. Backward Pass (Red): Gradients flow down.

Input: x=2, w=1, b=3.

  • a = x+w = 3.
  • y = a*b = 9.
  • dy/dy = 1.
  • dy/da = b = 3.
  • dy/dx = dy/da · da/dx = 3 · 1 = 3.

5. Summary

  • Computational Graph: Represents math as a tree.
  • AutoDiff: Applies Chain Rule automatically on the graph.
  • Backward Mode: Efficient for functions with many inputs and few outputs (like Loss functions).

Next: Backpropagation from Scratch →