Scaling & Production

[!NOTE] This module explores the core principles of Scaling & Production, deriving solutions from first principles and hardware constraints to build world-class, production-ready expertise.

1. The Real-World Hook: The Black Friday Crash

Imagine your e-commerce application is running on a single server. It’s Black Friday, and traffic suddenly spikes by 10,000%. What happens?

  • CPU Exhaustion: The server CPU hits 100% trying to compute checkouts.
  • RAM Saturation: Memory is filled with concurrent active user sessions.
  • Connection Exhaustion: The network interface cannot accept any more TCP connections.

Eventually, the server crashes. To handle this, we need Horizontal Scaling—adding more machines (or containers) rather than upgrading a single machine (Vertical Scaling).

2. Horizontal Scaling in Docker

One of Docker’s superpowers is the ability to spin up multiple copies (replicas) of a service instantly.

# Start 3 copies of the 'worker' service
docker compose up --scale worker=3 -d

This is crucial for testing:

  • Race conditions in your database when multiple instances write simultaneously.
  • Session stickiness issues (if a user logs into instance A, what happens when their next request hits instance B?).
  • Queue consumer throughput when draining messages from a broker like RabbitMQ.

Hardware Reality: True horizontal scaling requires your application to be Stateless. If your application stores local files or in-memory session data, subsequent requests routed to a different replica will fail. State must be externalized to a database (e.g., PostgreSQL) or cache (e.g., Redis).


3. The Port Conflict Problem

If your docker-compose.yaml binds a specific host port:

services:
  api:
    ports:
      - "8080:80" # Host:Container

You cannot scale this service. Why? Because only one process can listen on Host Port 8080. If you try to launch a second replica, it will crash with “Port already in use”.

The Solution: Use a Load Balancer

  1. Remove the ports binding from the app service.
  2. Add a proxy service (Nginx/Traefik) that listens on Port 80.
  3. Configure the proxy to distribute traffic to the app containers.

Anatomy of a Load Balancer Request

Let’s break down how a request flows through a load balancer to a scaled service:

  1. Client initiates a TCP connection to the Load Balancer (Nginx) on port 80.
  2. Nginx terminates the connection and examines the HTTP request.
  3. Nginx consults its upstream pool (e.g., App 1, App 2, App 3) and applies a routing algorithm (like Round Robin).
  4. Nginx opens a new TCP connection to the selected container over Docker’s internal network.
  5. App Container processes the request and sends the response back to Nginx, which forwards it to the Client.

Interactive: Load Balancer Simulator

See how traffic is distributed across replicas.

👤
User
Nginx
Port 80
App 1
App 2
App 3
Waiting for traffic...

4. Code Example: Nginx Load Balancing

How to implement the simulator above in real code.

docker-compose.yaml:

services:
  # The Application (scaled)
  app:
    image: my-app
    # NO PORTS section! Internal only.

  # The Load Balancer
  nginx:
    image: nginx:alpine
    ports:
      - "80:80"
    volumes:
      - ./nginx.conf:/etc/nginx/nginx.conf:ro
    depends_on:
      - app

nginx.conf:

events {}
http {
    upstream myapp {
        # Docker DNS resolves 'app' to MULTIPLE IPs!
        # Nginx automatically round-robins between them.
        server app:8080;
    }

    server {
        listen 80;
        location / {
            proxy_pass http://myapp;
        }
    }
}

Now run:

docker compose up --scale app=3 -d

Nginx will automatically distribute traffic to all 3 containers.