Load Balancing Algorithms: The Selection Logic

The Traffic Cop of the Internet

Imagine a busy intersection with 5 lanes (Servers). If the Traffic Cop (Load Balancer) blindly waves every car into Lane 1, a traffic jam forms instantly, even if Lanes 2-5 are empty.

The Algorithm is the decision logic the Traffic Cop uses. A smart algorithm ensures all lanes move smoothly, handling slow trucks (heavy requests) and Ferraris (light requests) efficiently.

[!TIP] Interview Tip: Don’t just list algorithms. Explain why you’d choose one. “I’d use Least Connections because our request times vary significantly” is a Senior Engineer answer.

These algorithms follow a fixed pattern and do not check the current health or load of the servers. They are fast but “dumb”.

A. Round Robin (RR)

The simplest method. Requests are distributed sequentially: S1 -> S2 -> S3 -> S1...

Pros: Extremely fast, no state required.
Cons: Assumes all servers are equal and all requests are equal.
Failure Mode: The “Slow Server Problem”. If Server 2 gets stuck processing a heavy video upload, RR keeps sending it new requests, causing a pile-up (Head-of-Line Blocking).

B. Weighted Round Robin

Assigns a “Weight” (Capacity) to each server.

Scenario: You have 1 powerful server (Weight 3) and 2 older servers (Weight 1).
Logic: The powerful server gets 3 requests for every 1 request the others get.
Use Case: Canary Deployments (send 10% traffic to V2) or mixed hardware clusters.

C. IP Hash

Uses the client’s IP address to determine the server: Hash(Client_IP) % N.

Pros: Sticky Sessions. User A always goes to Server 1. Crucial for local caching.
Cons: Thundering Herd. If a server is added or removed, the hash modulo changes for everyone, invalidating huge chunks of cache instantly. (Solved by Consistent Hashing in Module 07).

2. Dynamic Algorithms (The “Smart” Choice)

These algorithms inspect the current state (active connections, CPU, latency) before making a decision.

A. Least Connections

Sends the request to the server with the fewest active connections.

Logic: If S1 has 100 connections and S2 has 5, send to S2.
Pros: Handles requests of varying duration well. If S1 is stuck on a heavy job, its connection count stays high, so the LB naturally diverts traffic to S2.
Use Case: Long-lived connections (WebSockets, Database queries).

B. Least Response Time (TTFB)

Sends traffic to the server that is responding the fastest (Lowest TTFB).

Logic: The LB tracks the response time of every request and maintains a rolling average.
Pros: Good for handling “Noisy Neighbors” (servers that are slow due to external factors).

3. Advanced Strategies (Hyperscale)

At massive scale (e.g., Netflix, Google), simple counting isn’t enough.

A. Power of Two Choices (P2C)

Querying all servers to find the absolute “Least Connections” is expensive (O(N)) when you have 10,000 servers. It creates a “Thundering Herd” on the LB’s control plane.

Strategy:
1. Pick 2 servers at random.
2. Check their load (connections).
3. Send to the better one.
Why: Mathematically, this is O(1) complexity but yields results nearly identical to O(N). It balances load perfectly without checking everyone.

B. Peak EWMA (Exponential Weighted Moving Average)

Used by Linkerd and Envoy.

The Problem: “Least Response Time” is jittery. A single slow request might make a server look bad for too long. Conversely, a crashing server returning fast 500 errors looks “fast”!
The Solution: Use a decay formula.
- New_Avg = (Weight * Current_RTT) + ((1 - Weight) * Old_Avg)
- Peak: If the current RTT is higher than the average, update instantly (be paranoid). If it’s lower, decay slowly (be skeptical).
Result: The LB reacts instantly to lag spikes but is slow to trust a server again.

C. Maglev Hashing (Google)

Consistent Hashing uses a ring (O(log N) lookup). Google needed something faster (O(1)).

Maglev: Uses a massive static lookup table (e.g., 65537 slots).
Populating: Each backend server generates its own permutation of preferences (e.g., “I want slot 1, then 5, then 99…”). The table is filled by iterating through servers and letting them pick the next available slot in their preference list.
Lookup: Table[Hash(Packet) % M]. It’s a direct array access.
Result: Perfect consistency, minimal disruption, and constant time lookups.

D. Bounded Load Consistent Hashing (Google)

Consistent Hashing is great, but it has a flaw: Hot Shards. If one node gets popular keys, it melts.

The Idea: Combine Consistent Hashing with Least Connections.
Mechanism:
1. Hash the key to find the “Home” server (Consistent Hashing).
2. Check the load of that server.
3. If Load > Average_Load * 1.25, reject it and try the next server on the ring.
Result: This simple check (used in Vimeo and Google) prevents any single node from being overloaded by more than 25%, solving the hot shard problem while maintaining good cache locality.

System Walkthrough: Deciding Where to Send a Packet

Let’s visualize the decision logic for the P2C algorithm (Power of Two Choices).

Request Arrives: GET /api/user/123 from 192.168.1.50.
Random Selection:
- Total Servers: 100.
- LB picks 2 random indices: Server #42 and Server #88.
State Inspection:
- Server #42: 15 active connections.
- Server #88: 2 active connections.
Decision:
- 2 < 15, so choose Server #88.
Forwarding:
- LB forwards packet to Server #88.
Outcome:
- Server #88 handles it quickly.
- If we had used Round Robin, we might have hit Server #42, which was busy.

🕹️ Interactive: The Algorithm Arena

Simulate a high-traffic scenario.

Scenario: 3 Servers.
- S1 (Fast): Powerhouse (Processes 3 req/tick).
- S2 (Variable): Jittery (Processes 0.5 - 2.5 req/tick).
- S3 (Slow): Legacy (Processes ~0.5 req/tick).
Goal: Watch how Round Robin fails to account for S3’s slowness, building a huge queue. Switch to Least Connections or P2C to see the load balance out.

Algorithm

S1: FAST CPU: 3.0GHz (Wt: 3)

Queue: 0

IDLE

S2: JITTER CPU: Variable (Wt: 1)

Queue: 0

IDLE

S3: SLOW CPU: 1.0GHz (Wt: 1)

Queue: 0

IDLE

System Ready. Select an algorithm and Burst.

Summary

Algorithm	Best For	Pros	Cons
Round Robin	Simple, homogeneous clusters	Fast, Stateless	Fails on slow servers (HOL Blocking)
Weighted RR	Mixed Hardware	Uses Capacity	Static, doesn’t react to load
Least Conn	Long-lived connections (DB, Chat)	Adapts to slow servers	Requires state (connection counts)
IP Hash	Caching, Shopping Carts	Session Stickiness	Thundering Herd on scaling
P2C	Hyperscale (10k+ servers)	O(1) efficiency, avoids worst-case	Slightly more complex than RR

[!WARNING] Don’t Over-Engineer: Start with Round Robin (or Weighted RR). Only move to Least Connections if you have data showing significant variance in request processing times. Complexity is the enemy of reliability.

Load Balancing Algorithms: From Round Robin to P2C

Load Balancing Algorithms: The Selection Logic

The Traffic Cop of the Internet

1. Static Algorithms (The “Blind” Rotation)

A. Round Robin (RR)

B. Weighted Round Robin

C. IP Hash

2. Dynamic Algorithms (The “Smart” Choice)

A. Least Connections

B. Least Response Time (TTFB)

3. Advanced Strategies (Hyperscale)

A. Power of Two Choices (P2C)

B. Peak EWMA (Exponential Weighted Moving Average)

C. Maglev Hashing (Google)

D. Bounded Load Consistent Hashing (Google)

System Walkthrough: Deciding Where to Send a Packet

🕹️ Interactive: The Algorithm Arena

Summary