The Landscape of Learning: Convexity & Loss
1. Introduction: The Terrain
Training a machine learning model is like hiking in the dark. You are at some location (current weights) and you want to reach the lowest valley (minimum loss).
The shape of this terrain is defined by the Loss Function $L(\theta)$.
- Simple Models (Linear Regression): The terrain is a perfect bowl. You just walk down.
- Deep Learning: The terrain is a rugged mountain range with many fake valleys (local minima) and flat plateaus (saddle points).
2. Convexity: The Happy Path
A function is Convex if a line segment connecting any two points on the graph lies above the graph. \(f(tx + (1-t)y) \le t f(x) + (1-t) f(y)\)
Why we love Convexity:
- Any Local Minimum is also the Global Minimum.
- We can’t get stuck.
- Example: $f(x) = x^2$.
3. Non-Convexity: The Reality of Deep Learning
Neural Networks are highly Non-Convex.
- Local Minima: Shallow valleys that aren’t the best solution.
- Saddle Points: Points where the slope is zero, but it’s a minimum in one direction and a maximum in another (like a horse saddle).
- Plateaus: Flat regions with vanishing gradients.
Despite this, SGD works surprisingly well.
4. Interactive Visualizer: The Terrain Explorer
Visualize the difference between a Convex “Bowl” and a Non-Convex “Wobbly” surface. Rotate the camera to see the hidden valleys.
5. Summary
- Loss Function: Defines the terrain we traverse.
- Convex: Easy, guaranteed global minimum.
- Non-Convex: Hard, full of traps (Saddle points), requires smart optimizers.