Review & Cheat Sheet
[!NOTE] This module explores the core principles of Review & Cheat Sheet, deriving solutions from first principles and hardware constraints to build world-class, production-ready expertise.
1. Cheat Sheet: The Big Picture
We’ve moved from basic matrix operations to the core machinery of Machine Learning.
| Concept | The “One Liner” | Key Equation / Code | Application |
|---|---|---|---|
| Eigenvector | An axis that doesn’t rotate, only stretches. | Av = λv | PageRank, Stability Analysis. |
| Eigenvalue | The stretch factor along the eigenvector. | np.linalg.eig(A) |
Variance in PCA, Curvature in Hessian. |
| SVD | Factoring any matrix into Rotation-Stretch-Rotation. | A = U Σ VT | Compression, Denoising, Recommenders. |
| PCA | Finding the best axes to project data onto. | Eig of Σ = (1/n)XTX | Dimensionality Reduction. |
| Tensor | A multi-dimensional grid of numbers. | torch.rand(3, 256, 256) |
Deep Learning Data Structure. |
| Broadcasting | Stretching smaller tensors to match larger ones. | (4,1) + (4,4) = (4,4) |
Efficient Coding. |
| Jacobian | First derivatives of a vector function. | Jij = ∂yi / ∂xj | Sensitivity, Backpropagation. |
| Hessian | Second derivatives (Curvature). | torch.autograd.functional.hessian |
Optimization Landscape (Bowl vs Saddle). |
| Newton’s Method | Jumping to the minimum using curvature. | xnew = xold - H-1 ∇ f | Fast Optimization (Second Order). |
2. Interactive Flashcards
Test your recall. Click a card to flip it.
[!TIP] How to use: Tap on a card to reveal the answer. Try to answer before flipping!
What is the geometric meaning of an Eigenvector?
A vector that does not change direction after a linear transformation is applied (it only scales).
What does a Zero Gradient and Mixed Hessian Eigenvalues imply?
A Saddle Point. It's a minimum in one direction and a maximum in another.
Why is PCA sensitive to Outliers?
Because it maximizes Variance (Squared Error). A distant point has a massive squared error, pulling the axis towards it.
What is Broadcasting?
The implicit rule that stretches a smaller tensor (e.g., a vector) to match the shape of a larger one during operations.
Why do we use ReLU in Neural Networks?
To introduce Non-Linearity. It folds the space, allowing the network to learn complex boundaries.
SVD decomposes a matrix into which 3 components?
U (Left Singular Vectors), Σ (Singular Values), VT (Right Singular Vectors).
What is the Gradient vector?
A vector pointing in the direction of steepest ascent (greatest increase of the function).
What is the Rank of a Color Image Tensor?
Rank 3 (Height, Width, Channels).
What is the Manifold Hypothesis?
The idea that high-dimensional real-world data lies on a lower-dimensional "surface" (manifold) embedded within that space.
3. What’s Next?
You have mastered the algebra of transformations. Next, we move to Discrete Math & Information Theory, where we learn about Graphs, Entropy, and how to measure information itself.