Module Review: Load Balancing
🧠 Flashcards
Test your recall. Click a card to flip it.
L4 Load Balancing
Tap to reveal
Transport Layer
Routing based on IP & Port only. Fast but "dumb". Does not decrypt SSL (TCP Passthrough). Uses eBPF/XDP for speed.
L7 Load Balancing
Tap to reveal
Application Layer
Routing based on URL, Headers, Cookies. Smart but CPU heavy. Requires SSL Termination (Decryption).
Maglev Hashing
Tap to reveal
Google's Consistent Hashing
Uses a massive permutation table to achieve O(1) lookup time for distributing packets. Superior to Ring Hashing at scale.
Active-Passive
Tap to reveal
High Availability
One LB handles traffic. The other sleeps. If Active dies, Passive wakes up via Heartbeat check (VRRP/Keepalived).
Connection Pooling
Tap to reveal
Latency Optimization
The LB keeps connections to the backend open (Keep-Alive) to avoid paying the TCP 3-Way Handshake cost for every request.
Consistent Hashing
Tap to reveal
Scaling Strategy
A hash ring strategy that minimizes data movement when adding/removing servers. Crucial for Distributed Caches.
Sidecar Proxy
Tap to reveal
Service Mesh
A reverse proxy attached to every service instance (e.g., Envoy). Handles mTLS, Retries, and Observability.
Peak EWMA
Tap to reveal
Stability Metric
Exponential Weighted Moving Average. Used by Linkerd to detect slow servers while ignoring short-lived spikes.
QUIC (HTTP/3)
Tap to reveal
UDP Protocol
Modern protocol running on UDP. Challenges L4 LBs because it requires tracking Connection IDs (CIDs) instead of IP tuples.
TLS Fingerprinting
Tap to reveal
Security (JA3)
Identifying clients (e.g., Bots vs Browsers) by analyzing the specific parameters of their SSL Client Hello handshake.
SNI
Tap to reveal
Server Name Indication
Allows L4 Load Balancers to peek at the hostname during the TLS Handshake without full decryption.
Thundering Herd
Tap to reveal
Concurrency Problem
When many processes wake up simultaneously to handle an event (or reconnect), overwhelming the system. Solved by Jitter.
GSLB
Tap to reveal
Global Server Load Balancing
Distributing traffic across data centers worldwide using DNS (GeoDNS) or Anycast (BGP) to reduce latency.
Bounded Load
Tap to reveal
Consistent Hashing Optimization
A technique to prevent hot shards by rejecting requests to an overloaded node and passing them to the next peer on the ring.
📝 Scenario Quiz
1. You are designing a video streaming service (Netflix). You need maximum throughput for video chunks. Which LB do you choose?
2. You have a Microservices architecture where `/api` goes to Service A and `/payment` goes to Service B. Which LB is required?
3. Your backend servers have varying hardware specs (some fast, some slow). Which algorithm is BEST?
4. You need to process 10M packets per second for a DDoS scrubber. The standard Linux Kernel is too slow. What technology do you use?
5. You want to detect if a client is a Bot or a real Chrome browser, even if they spoof the User-Agent. What technique helps?
📋 Cheat Sheet
L4 vs L7
| Feature | L4 Load Balancer | L7 Load Balancer |
|---|---|---|
| Layer | Transport (TCP/UDP) | Application (HTTP) |
| Visibility | IP & Port (Envelope) | URL, Headers, Body (Content) |
| Speed | Ultra High (eBPF) | Slower (CPU Intensive) |
| Decryption | No (Pass-through) | Yes (SSL Termination) |
| Caching | Impossible | Possible (Static Files) |
Concepts
| Concept | Definition |
|---|---|
| SPOF | Single Point of Failure. If the LB dies, the site dies. |
| Sticky Session | Ensuring a user’s requests always go to the same server (via IP Hash or Cookie). |
| Maglev | Google’s Consistent Hashing algorithm for O(1) lookups. |
| Least Conn | Smart routing to the server with fewest active connections. |
| P2C | Power of Two Choices. Pick 2 random servers, choose the best. O(1) efficiency. |
| Peak EWMA | Peak Exponential Weighted Moving Average. Reacts quickly to latency spikes. |
| Active-Passive | High Availability setup where a backup LB takes over if the primary fails. |
| Sidecar Proxy | A helper proxy (Envoy) that runs alongside a service to handle network logic. |
| Connection Pooling | Reusing persistent TCP connections to avoid handshake overhead. |
| GSLB | Global Server Load Balancing. Using DNS or Anycast to route users to the closest datacenter. |
| Bounded Load | Consistent Hashing optimization to prevent hot shards. |
| JA3 | TLS Fingerprinting standard used to identify the client application (e.g., bot vs browser). |
| QUIC | New UDP-based protocol (HTTP/3) that improves performance but complicates L4 load balancing. |
Technology Choice
| Tool | Best For |
|---|---|
| Nginx | General purpose web server, Static files, Simple L7 LB. |
| HAProxy | High performance, pure LB. Best for massive scale TCP/HTTP. |
| Envoy | Service Mesh (Sidecar). Observability, Distributed Tracing. |
| Traefik | Kubernetes/Docker Ingress. Auto-discovery. |
| Katran | Facebook’s eBPF-based L4 Load Balancer. |
🏗️ Whiteboard Summary
1. The Problem
- Vertical Scaling fails (Kitchen Fire).
- DNS Round Robin fails (Caching).
- Need a Single VIP entry point.
2. Architecture
- L4: Fast, Encrypted, Dumb.
- L7: Smart, Decrypted, Slow.
- Active-Passive: For HA.
3. Algorithms
- Round Robin: Simple.
- Least Conn: Variable workloads.
- P2C: Hyperscale (O(1)).
- Maglev: Google Scale.
4. Optimization
- Health Checks: Deep vs Shallow.
- Conn Pooling: Reduce Handshakes.
- Draining: Zero Downtime Deploy.
- Security: WAF & JA3.