DL App: Neural Network Layers

1. Introduction: The Building Block

A Neural Network is just a chain of Linear Algebra operations interspersed with non-linear functions. The core component is the Dense Layer (or Fully Connected Layer).

Mathematically, a layer transforms an input vector x into an output vector y:

y = σ(Wx + b)
  • x: Input Vector (Shape: N × 1).
  • W: Weight Matrix (Shape: M × N). This rotates and stretches the input space.
  • b: Bias Vector (Shape: M × 1). This shifts the space (translation).
  • σ: Activation Function (e.g., ReLU). This bends or folds the space.

The Manifold Hypothesis

Why does this work? Real-world data (like images of cats) lies on a low-dimensional “manifold” (a crumpled sheet) inside a high-dimensional space. The goal of the neural network is to uncrumple this sheet so that the classes (cats vs dogs) can be separated by a simple line.


2. The Activation Function (Non-Linear)

Without σ, a deep network would just be one big linear matrix (since W2(W1x) = Wnewx). The activation function introduces non-linearity.

A. ReLU (Rectified Linear Unit)

ReLU(z) = max(0, z)
  • Effect: Folds the space along the axes. Points in the negative quadrant get squashed to zero.
  • Pros: Efficient, solves Vanishing Gradient.
  • Cons: “Dead ReLU” (if inputs are always negative, gradients die).

B. Leaky ReLU

LReLU(z) = max(0.01z, z)
  • Effect: Similar to ReLU, but allows a tiny “leak” for negative values.
  • Pros: Fixes the “Dead ReLU” problem.

C. Sigmoid / Tanh

  • Effect: Squashes space into a bounded range [0, 1] or [-1, 1].
  • Pros: Smooth, probability-like.
  • Cons: Vanishing Gradient. Notice in the visualizer how large inputs get squashed into a tiny region where the slope is almost zero? That kills learning.

3. Interactive Visualizer: The Neural Fold v3.0

Below, we visualize a single layer with 2 inputs and 2 neurons. We start with a grid of points (Blue).

  1. Linear Step: Apply Wx. (Shear/Rotate).
  2. Activation Step: Apply σ(z).

Task: Switch between ReLU, Leaky ReLU, and Sigmoid. Observe how ReLU folds the space like a piece of paper, while Leaky ReLU bends it slightly.

Weights (W)
Activation
Blue Dots: Input Grid
Green Lines: Transformed Grid

4. Summary

  • W (Weights): Linearly transforms the space (Rotate/Scale/Shear).
  • b (Bias): Translates the space.
  • Activation: Non-linearly warps the space.
    • ReLU: Folds space. Good for Deep Learning.
    • Sigmoid: Squashes space. Good for probability output, bad for deep layers (Vanishing Gradient).

Next: Module Review →