Load Balancers: The Traffic Controllers
The Problem: Success is Dangerous
Imagine you open a pizza shop. It becomes famous. Suddenly, 10,000 people want pizza at the same time. If you have only one chef (One Server), the kitchen catches fire. This is where Vertical Scaling hits its limit. You can only upgrade a single machine so much before it becomes prohibitively expensive or physically impossible.
Why not just use DNS?
You might think: “I’ll just add 10 servers and give their IPs to the DNS server. The DNS server will rotate them.” This is called DNS Round Robin, and it has a fatal flaw: DNS Caching.
- The browser (or ISP) caches the IP address of “pizzashop.com” for minutes or hours (TTL).
- If Server 1 crashes, the user’s browser keeps trying to talk to the dead IP until the cache expires.
- Result: 1/10th of your users see a “Connection Refused” error for an hour.
You need a smarter solution. You need a Manager at the counter who knows which chef is actually cooking.
[!TIP] In System Design, the Load Balancer (LB) is that Manager. It is the single entry point (VIP) that distributes incoming network traffic across multiple backend servers.
Why Do We Need Them?
It’s not just about splitting work. The Load Balancer provides three critical superpowers that transform a fragile application into a robust system:
1. Scalability (The “Elastic Waistband”)
You can add 100 more servers behind the scenes, and the user never knows. They just talk to the LB’s IP address. This decoupling allows you to scale up or down based on traffic demand without changing the client-side configuration. This directly increases your system’s total Throughput.
2. Availability (The “Pulse Check”)
If Server 3 crashes, the LB detects it via Health Checks (often called Heartbeats).
- Result: The user never sees a
500 Internal Server Error. They are seamlessly routed to a healthy server.
3. Security (The “Bouncer”)
The LB can defend against DDoS attacks and hide the actual IP addresses of your backend servers. It can also handle TLS Termination, decrypting incoming requests so your web servers don’t have to spend CPU cycles on it.
Types of Load Balancers
1. DNS Load Balancing (GeoDNS)
While standard DNS Round Robin is flawed, sophisticated GeoDNS (like Amazon Route53) is powerful.
- Mechanism: It resolves the domain name to an IP address based on the user’s geographic location.
- Use Case: A user in London gets the IP of the UK Load Balancer; a user in Tokyo gets the IP of the Japan Load Balancer.
- Limit: It is still subject to caching issues, so it’s usually the first layer of defense, not the only one.
2. Hardware Load Balancers (F5 Big-IP, Citrix)
- Mechanism: Proprietary physical appliances in your data center.
- Pros: Extreme performance (ASICs), massive throughput.
- Cons: Expensive ($$$), hard to automate (API limitations), rigid capacity.
3. Software Load Balancers (Nginx, HAProxy, Envoy)
- Mechanism: Software running on commodity Linux servers.
- Pros: Cheap, programmable, cloud-native, easy to scale.
- Cons: Consumes CPU/Memory on the host.
[!NOTE] Modern cloud LBs (AWS ALB, Google Cloud LB) are essentially massive fleets of Software LBs managed for you.
Deep Dive: Health Checks
How does the LB actually know a server is dead?
1. Shallow Checks (L3/L4)
The LB pings the IP or tries to open a TCP socket.
- Check: “Is the machine ON?”
- Flaw: The machine might be ON, but the database connection is broken, so the app is returning 500 errors. The LB thinks it’s healthy.
2. Deep Checks (L7)
The LB sends an HTTP request to a specific endpoint: GET /health.
- Check: “Is the application functional?”
- Implementation: The
/healthendpoint checks DB connectivity, Cache status, and Disk space. If any fail, it returns503 Service Unavailable. - Result: The LB stops sending traffic until the app recovers.
[!WARNING] Flapping: A dangerous state where a server toggles rapidly between Healthy and Unhealthy. This usually happens when a server is overloaded; it passes a simple Health Check (idle) but fails actual traffic (load). Hysteresis (requiring X successes to be marked healthy again) solves this.
3. Connection Draining (Graceful Shutdown)
What if you want to take a server offline for maintenance? You can’t just kill it, or you’ll drop active users.
- Mechanism: You tell the LB to “Drain” Server A.
- Action: The LB stops sending new requests to Server A but allows existing connections to finish (until a timeout, e.g., 30s).
- Result: Zero downtime deployments.
System Walkthrough: A Request’s Journey
Let’s trace exactly what happens when you type www.example.com into your browser.
- DNS Resolution:
- Browser asks DNS: “Where is example.com?”
- DNS (GeoDNS) replies: “Go to IP
1.2.3.4(The Load Balancer).”
- Connection Establishment:
- Browser sends TCP SYN to
1.2.3.4. - LB accepts connection (Completes 3-way handshake).
- Browser sends TCP SYN to
- Load Balancing Decision:
- LB looks at its pool of backend servers (
10.0.0.1,10.0.0.2). - LB applies Algorithm (e.g., Round Robin). “It’s
10.0.0.1’s turn.”
- LB looks at its pool of backend servers (
- Forwarding:
- LB opens a connection to
10.0.0.1(or reuses a pooled connection). - LB forwards the HTTP request.
- LB opens a connection to
- Response:
- Server
10.0.0.1processes request and sends HTML back to LB. - LB forwards HTML back to Browser.
- Server
High Availability of the LB itself
“But wait… if the LB is the Manager, what if the Manager has a heart attack?” The Load Balancer itself is a SPOF.
To solve this, we use Redundancy:
Active-Passive (High Availability)
- Setup: Two LBs. One is Active, the other is Passive (standby).
- Mechanism: They talk using VRRP (Virtual Router Redundancy Protocol) and share a Floating IP (VIP).
- Failover: If Active stops sending heartbeats, Passive takes over the VIP.
Deep Dive: Global Server Load Balancing (GSLB)
What if your users are in Tokyo, London, and New York? A single LB in Virginia is not enough. You need GSLB.
1. GeoDNS (The Phonebook Strategy)
- Mechanism: The DNS server looks at the User’s IP.
- Logic: “User is from Japan IP range? Return the IP of the Tokyo LB.”
- Pros: Simple, supported by Route53, Cloudflare.
- Cons: DNS Caching. If Tokyo goes down, users in Japan might still try to connect to the dead IP for 5 minutes (TTL).
2. Anycast (The Magic IP)
- Mechanism: BGP (Border Gateway Protocol) routing.
- Concept: You announce the SAME IP Address (e.g.,
1.1.1.1) from multiple physical locations (Tokyo, London, NY). - Routing: The internet’s routers naturally send the user’s packet to the physically closest data center (fewest hops).
- Pros: Instant Failover (BGP updates faster than DNS), no caching issues.
- Cons: Complex to set up (requires owning an ASN or using a provider like Cloudflare).
Observability: The RED Method
How do you know if your Load Balancer is healthy? We use the RED Method for monitoring microservices and LBs.
- R - Rate: The number of requests per second (RPS).
- Metric:
http_requests_total - Alert: Sudden drop (outage) or spike (DDoS).
- Metric:
- E - Errors: The number of requests failing.
- Metric:
http_requests_5xx - Alert: Error rate > 1%.
- Metric:
- D - Duration: How long requests take.
- Metric:
http_request_duration_seconds - Alert: P99 latency > 500ms.
- Metric:
Interactive Demo: The Traffic Controller
Test the resilience of a Load Balanced system.
- Start Traffic: Watch the LB distribute requests (Round Robin).
- Kill a Server: Click the “Power” button on a server to crash it.
- Burst Mode: Simulate a sudden traffic spike to see how the system behaves.
- Drain: Gracefully remove a server.
Summary
- Horizontal Scaling > Vertical Scaling.
- Deep Health Checks ensure the application is logically working, not just “on”.
- Connection Draining ensures smooth deployments.
- GSLB uses Anycast and GeoDNS to balance traffic globally.
Next, let’s look at the “Brains” of the operation: L4 vs L7 Load Balancing.