Caching and Load Balancing

Welcome to the Caching and Load Balancing chapter. In distributed systems, performance and scale are intimately tied to how efficiently we can distribute traffic and retrieve data. Here we will dive deep into consistent hashing, bloom filters, and the architectural decisions required for planet-scale systems.

1. The Scaling Problem: Why Simple Load Balancing Fails

The Hook: Imagine you’re running a massive global e-commerce platform on Black Friday. Your database is under heavy read load, so you deploy a fleet of cache servers. How do you decide which cache server stores which user’s shopping cart?

The Setup: You have N cache servers. The Naive Approach: Simple load balancing maps a request to a server using the modulo operator: server_index = hash(key) % N.

The Disaster Scenario (Cache Stampede): If 1 server crashes (say, N drops from 5 to 4), the modulo base changes. hash(key) % 4 will yield a different result than hash(key) % 5 for almost every single key. Suddenly, 80% to 90% of your cached data is mapped to the wrong server. The cache servers report “cache miss,” and a massive flood of queries hits your database simultaneously. This is called a Cache Stampede, and it will melt your database.

2. Consistent Hashing (The Ring)

Solution: The Ring. Consistent hashing solves the rehashing problem by keeping the mapping space constant.

The Mechanism:

The Hash Space: Imagine a hash space from 0 to 2^32 - 1 arranged in a circle (a ring).
Server Mapping: Hash the IP or name of your cache servers and place them on this ring.
Key Mapping: Hash the incoming key (e.g., user ID) and place it on the same ring.
Assignment: To find the server for a key, move clockwise along the ring from the key’s position until you find the first server.

The Result: If server X crashes, only the keys that were mapped to server X will move to the next server clockwise. The rest of the keys stay exactly where they are.

Virtual Nodes (Solving Uneven Distribution)

A common issue with basic consistent hashing is uneven distribution (servers might bunch up on the ring, causing one server to take most of the load). The solution is Virtual Nodes. Instead of mapping each server to a single point, we hash it multiple times (e.g., server1_1, server1_2, etc.), creating many virtual nodes for each physical server. This evens out the distribution across the ring.

Interactive Visualization: Consistent Hashing Ring

3. Bloom Filters (The “Maybe” Set)

The Problem: Cache Penetration What if an attacker continuously requests data that does not exist (e.g., user_id = -1)? The cache won’t have it (cache miss), so the request goes to the database. The database also doesn’t have it, so nothing is cached. The next request for -1 again goes straight to the database. This bypasses the cache entirely and can take down your database.

Solution: Bloom Filter. A probabilistic data structure placed before the cache.

Query: “Is ‘user_id’ in the database?”
Answer:
1. No: Definitely not. (100% confidence - we block the request).
2. Yes: Maybe. (Small probability of False Positive - we allow the request to hit the cache/DB).

The Anatomy of a Bloom Filter:

Bit Array: An array of M bits, initially all set to 0.
Hash Functions: K independent hash functions.
To Add an Item: Hash the item K times. Each hash yields an index in the bit array. Set those K bits to 1.
To Check an Item: Hash the item K times. Check the K indices in the bit array.
- If any bit is 0, the item was never added.
- If all bits are 1, the item might have been added (because other items could have set those bits to 1 by coincidence).

Interactive Visualization: Bloom Filter

4. Mathematical Anchor: False Positive Rate

The probability of a false positive in a Bloom Filter is approximately: $P \approx (1 - e^{-kn/m})^k$

m: Bit array size (number of bits).
n: Number of items inserted.
k: Number of hash functions.

The Tradeoffs:

More memory (larger m) = Fewer false positives.
More hash functions (larger k) = Bits fill up faster, which can increase false positives if m isn’t large enough.
Optimal k: $k = (m/n) \times \ln(2)$.

5. Deep Dive Strategy Lab: Caching Architectures (PEDALS)

Intuition Through Analogy

Think of this chapter like running a high-traffic consumer app. The goal is not to memorize a fixed trick, but to repeatedly answer:

What is the bottleneck?
Which constraint dominates (time, memory, latency, correctness)?
Which representation makes the bottleneck easier to eliminate?

Common Caching Strategies (The “Where” and “When”)

When designing your system, you must choose how your application, cache, and database interact.

Cache-Aside (Lazy Loading):
- Flow: App asks Cache. If miss, App asks DB. App writes to Cache.
- Pros: Cache only contains requested data. Cache failure doesn’t break the app (just slows it down).
- Cons: 3 network hops on a cache miss. Data can become stale if updated in DB but not in Cache.
Read-Through:
- Flow: App asks Cache. If miss, Cache asks DB, stores it, and returns to App.
- Pros: Transparent to the App. Good for read-heavy workloads.
- Cons: The first request is always slow.
Write-Through:
- Flow: App writes to Cache. Cache synchronously writes to DB. Return success to App.
- Pros: Data is always consistent. No stale data.
- Cons: High write latency (requires two sequential writes).
Write-Back (Write-Behind):
- Flow: App writes to Cache. Cache returns success. Cache asynchronously flushes to DB later.
- Pros: Lightning-fast writes. Great for write-heavy workloads (like view counters).
- Cons: If the Cache crashes before flushing, data is permanently lost.

Caching LoadBalancing