Scaling & Production
[!NOTE] This module explores the core principles of Scaling & Production, deriving solutions from first principles and hardware constraints to build world-class, production-ready expertise.
1. The Real-World Hook: The Black Friday Crash
Imagine your e-commerce application is running on a single server. It’s Black Friday, and traffic suddenly spikes by 10,000%. What happens?
- CPU Exhaustion: The server CPU hits 100% trying to compute checkouts.
- RAM Saturation: Memory is filled with concurrent active user sessions.
- Connection Exhaustion: The network interface cannot accept any more TCP connections.
Eventually, the server crashes. To handle this, we need Horizontal Scaling—adding more machines (or containers) rather than upgrading a single machine (Vertical Scaling).
2. Horizontal Scaling in Docker
One of Docker’s superpowers is the ability to spin up multiple copies (replicas) of a service instantly.
# Start 3 copies of the 'worker' service
docker compose up --scale worker=3 -d
This is crucial for testing:
- Race conditions in your database when multiple instances write simultaneously.
- Session stickiness issues (if a user logs into instance A, what happens when their next request hits instance B?).
- Queue consumer throughput when draining messages from a broker like RabbitMQ.
Hardware Reality: True horizontal scaling requires your application to be Stateless. If your application stores local files or in-memory session data, subsequent requests routed to a different replica will fail. State must be externalized to a database (e.g., PostgreSQL) or cache (e.g., Redis).
3. The Port Conflict Problem
If your docker-compose.yaml binds a specific host port:
services:
api:
ports:
- "8080:80" # Host:Container
You cannot scale this service. Why? Because only one process can listen on Host Port 8080. If you try to launch a second replica, it will crash with “Port already in use”.
The Solution: Use a Load Balancer
- Remove the
portsbinding from the app service. - Add a
proxyservice (Nginx/Traefik) that listens on Port 80. - Configure the proxy to distribute traffic to the app containers.
Anatomy of a Load Balancer Request
Let’s break down how a request flows through a load balancer to a scaled service:
- Client initiates a TCP connection to the Load Balancer (Nginx) on port 80.
- Nginx terminates the connection and examines the HTTP request.
- Nginx consults its upstream pool (e.g., App 1, App 2, App 3) and applies a routing algorithm (like Round Robin).
- Nginx opens a new TCP connection to the selected container over Docker’s internal network.
- App Container processes the request and sends the response back to Nginx, which forwards it to the Client.
Interactive: Load Balancer Simulator
See how traffic is distributed across replicas.
4. Code Example: Nginx Load Balancing
How to implement the simulator above in real code.
docker-compose.yaml:
services:
# The Application (scaled)
app:
image: my-app
# NO PORTS section! Internal only.
# The Load Balancer
nginx:
image: nginx:alpine
ports:
- "80:80"
volumes:
- ./nginx.conf:/etc/nginx/nginx.conf:ro
depends_on:
- app
nginx.conf:
events {}
http {
upstream myapp {
# Docker DNS resolves 'app' to MULTIPLE IPs!
# Nginx automatically round-robins between them.
server app:8080;
}
server {
listen 80;
location / {
proxy_pass http://myapp;
}
}
}
Now run:
docker compose up --scale app=3 -d
Nginx will automatically distribute traffic to all 3 containers.