DL App: Neural Network Layers

1. Introduction: The Building Block

A Neural Network is just a chain of Linear Algebra operations interspersed with non-linear functions. The core component is the Dense Layer (or Fully Connected Layer).

Mathematically, a layer transforms an input vector x into an output vector y:

    y = σ(Wx + b)

x: Input Vector (Shape: N × 1).
W: Weight Matrix (Shape: M × N). This rotates and stretches the input space.
b: Bias Vector (Shape: M × 1). This shifts the space (translation).
σ: Activation Function (e.g., ReLU). This bends or folds the space.

The Manifold Hypothesis

Why does this work? Real-world data (like images of cats) lies on a low-dimensional “manifold” (a crumpled sheet) inside a high-dimensional space. The goal of the neural network is to uncrumple this sheet so that the classes (cats vs dogs) can be separated by a simple line.

2. The Activation Function (Non-Linear)

Without σ, a deep network would just be one big linear matrix (since W₂(W₁x) = W_newx). The activation function introduces non-linearity.

A. ReLU (Rectified Linear Unit)

ReLU(z) = max(0, z)

Effect: Folds the space along the axes. Points in the negative quadrant get squashed to zero.
Pros: Efficient, solves Vanishing Gradient.
Cons: “Dead ReLU” (if inputs are always negative, gradients die).

B. Leaky ReLU

LReLU(z) = max(0.01z, z)

Effect: Similar to ReLU, but allows a tiny “leak” for negative values.
Pros: Fixes the “Dead ReLU” problem.

C. Sigmoid / Tanh

Effect: Squashes space into a bounded range [0, 1] or [-1, 1].
Pros: Smooth, probability-like.
Cons: Vanishing Gradient. Notice in the visualizer how large inputs get squashed into a tiny region where the slope is almost zero? That kills learning.

3. Interactive Visualizer: The Neural Fold v3.0

Below, we visualize a single layer with 2 inputs and 2 neurons. We start with a grid of points (Blue).

Linear Step: Apply Wx. (Shear/Rotate).
Activation Step: Apply σ(z).

Task: Switch between ReLU, Leaky ReLU, and Sigmoid. Observe how ReLU folds the space like a piece of paper, while Leaky ReLU bends it slightly.

Weights (W)

Activation

            Blue Dots: Input Grid

            Green Lines: Transformed Grid

4. Summary

W (Weights): Linearly transforms the space (Rotate/Scale/Shear).
b (Bias): Translates the space.
Activation: Non-linearly warps the space.
- ReLU: Folds space. Good for Deep Learning.
- Sigmoid: Squashes space. Good for probability output, bad for deep layers (Vanishing Gradient).

Next: Module Review →