Caching Strategies

In 2013, Justin Bieber posted a tweet and celebrity.twitter.com returned 503 errors for 20 minutes. The root cause: a single Redis key for his follower count expired. All 1.2 million followers simultaneously fetched the page, all got a cache miss, and all queried the database simultaneously — a Cache Stampede. Twitter engineers called this the “Celebrity Problem.” The fix required two lines of code: IF TTL < 5s: async_refresh(). Caching seems trivial until it fails at scale. Then it becomes the highest-leverage fix in your entire system.

[!IMPORTANT] In this lesson, you will master:

The Memory Hierarchy: Why L1 is 200x faster than RAM and 100,000x faster than SSD.

Temporal & Spatial Locality: The physical reason why for r in rows: for c in cols is faster than the reverse.

The 3 Demons: Defeating Cache Penetration, Avalanche, and Stampede with Mutexes and Jitter.

1. The “Open Book Exam” Analogy

Imagine you are taking a difficult Open Book Exam.

Scenario A (No Cache): For every question, you open the textbook, look up the index, find the page, and read the paragraph. This takes 5 minutes per question.
Scenario B (With Cache): After finding an answer, you write it on a “Cheat Sheet” next to you. If the same question appears again, you glance at the cheat sheet. This takes 5 seconds.

Caching is your Cheat Sheet. It stores the result of an expensive operation (Textbook Search / DB Query) in a temporary, fast storage layer (Cheat Sheet / RAM) so you don’t have to repeat the work.

2. Why Caching Works: Latency Numbers

To understand why we cache, we must look at the “Latency Ladder”. The difference between Memory (RAM) and Disk/Network is astronomical.

[!TIP] Try it yourself: Click “Switch to Human Scale” to see how vast the difference really is.

The Latency Ladder

L1 Cache

0.5 ns

RAM (Memory)

100 ns

SSD Read

150,000 ns

Network (Round Trip)

150,000,000 ns

If 1 CPU cycle was 1 second, a network request would take almost 5 years. This is why caching is critical.

Takeaway: Reading from RAM (Cache) is orders of magnitude faster than reading from Disk (Database) or Network. A cache hit is the difference between a website feeling “snappy” and “sluggish”.

Effective Latency Calculation

Even a small improvement in Hit Ratio can massively drop effective latency.

[!TIP] Try it yourself: Adjust the sliders to see how Cache Hit Ratio impacts the total latency.

Effective Latency Calculator

Cache Hit Ratio: 90% Cache Latency: 1 ms DB Latency: 100 ms

Effective Latency

10.9 ms

Formula:

(HitRate × Cache) + (MissRate × DB)

3. The Hardware Reality: Cache Locality

Software engineers often forget that hardware matters. Caching isn’t just about key-value stores like Redis; it happens inside the CPU (L1/L2/L3).

CPU Cache relies on Locality of Reference:

Temporal Locality: If you access data at address X, you will likely access X again soon. (Loop variables).
Spatial Locality: If you access data at address X, you will likely access X+1 soon. (Arrays).

The 2D Array Trap

Traversing a matrix in the “wrong” direction can be 10x slower due to Cache Misses (breaking Spatial Locality).

import time
import numpy as np

# Create a large matrix (10k x 10k)
size = 10000
matrix = np.ones((size, size), dtype=np.int8)

# Case A: Row-Major (Good Spatial Locality)
# Memory is stored: [Row1][Row1][Row1]...[Row2][Row2]
start = time.time()
sum_a = 0
for r in range(size):
  for c in range(size):
    sum_a += matrix[r][c]  # Access r, then r+1...
print(f"Row-Major Time: {time.time() - start:.4f}s")

# Case B: Column-Major (Bad Spatial Locality)
# We jump: [Row1]...[Row2]...[Row3]...
start = time.time()
sum_b = 0
for c in range(size):
  for r in range(size):
    sum_b += matrix[r][c]  # Jumping memory rows constantly!
print(f"Col-Major Time: {time.time() - start:.4f}s")

Result: Row-Major is significantly faster because it hits the CPU Cache line.

[!NOTE] Hardware-First Intuition: The CPU doesn’t move data in bytes; it moves them in Cache Lines (typically 64 bytes). When you read matrix[0][0], the hardware automatically pulls the next 63 bytes into the L1 cache for “free”. If you access matrix[0][1] next, it’s an instant L1 Hit. But if you jump to matrix[1][0], you force the CPU to throw away the line and wait for a completely new 64-byte block from slow RAM.

4. Where to Cache? (Layers of Caching)

Caching is not just one thing. It happens at every layer of the request lifecycle.

💻

Browser

Local Storage / HTTP Cache

🌍

CDN

Edge PoPs (Static Assets)

⚖️

Load Balancer

SSL Termination / Reverse Proxy

[!]

App Cache

Redis / Memcached (Key-Value)

💾

Database

Buffer Pool / Query Cache

Browser Cache: Cache-Control headers tell Chrome to store images locally. (Zero Network).
CDN (Content Delivery Network): Cloudflare/AWS CloudFront stores static assets close to the user. (Short Network).
Load Balancer: Nginx/HAProxy can cache entire HTML pages. See Load Balancing.
Application Cache: Redis/Memcached stores expensive query results. (Fast Network).
Database Cache: Postgres has an internal Buffer Pool (Shared Buffers) to keep hot rows in RAM. See Database Basics.

5. Implementation: Cache-Aside vs Read-Through

A. Cache-Aside (Lazy Loading) - Most Common

The Application is responsible for orchestrating the flow. The cache is “dumb”.

Read: App asks Cache.
Miss: If Cache is empty, App asks Database.
Set: App writes the DB result into Cache.

Pros: Resilient to cache failure (app just hits DB). Cons: Code complexity (logic in app), potential for stale data.

B. Read-Through

The Cache sits between the App and the DB. The App treats the Cache as the main data store.

Read: App asks Cache.
Miss: Cache (not App) fetches from DB, updates itself, and returns data to App.

Pros: Simpler app code. Cons: Requires a smarter cache (plugin/provider support), single point of failure.

Python Decorator Example (Cache-Aside)

import redis
import json
import logging
from functools import wraps

# Connect to Redis
cache = redis.Redis(host='localhost', port=6379, db=0)

def cached(ttl_seconds=60):
  def decorator(func):
    @wraps(func)
    def wrapper(*args, **kwargs):
      key = f"{func.__name__}:{str(args)}"

      # 1. Check Cache (Read)
      try:
        result = cache.get(key)
        if result:
          return json.loads(result)
      except redis.RedisError as e:
        # FAIL OPEN: If cache is down, just hit the DB. Don't crash.
        logging.error(f"Cache Read Error: {e}")

      # 2. Compute/Fetch (DB Call)
      val = func(*args, **kwargs)

      # 3. Set Cache (Write)
      try:
        if val is not None:
          cache.setex(key, ttl_seconds, json.dumps(val))
      except redis.RedisError as e:
        logging.error(f"Cache Write Error: {e}")

      return val
    return wrapper
  return decorator

@cached(ttl_seconds=10)
def get_user_profile(user_id):
  # Simulate DB Call
  print(f"Fetching User {user_id} from DB...")
  return {"id": user_id, "name": "Alice", "role": "Admin"}

6. Interactive Demo: Cache Hit vs Miss & DB Load

Experience the latency difference and the impact on your Database.

Cache Hit (Green): Data found in Memory. Latency ~1ms. DB Load: 0%.
Cache Miss (Red): Data not found. Must fetch from DB. Latency ~100ms. DB Load Increases.
Capacity: Watch the bar fill up. When full, we need an eviction policy.

[!TIP] Try it yourself: Enter a key (e.g., “A”) and click FETCH. Do it again to see a Cache Hit.

👤

User

CACHE

(RAM)

0/5 Items

(Disk)

Load: 0%

Waiting for request...

7. The Three Cache Demons

When designing large-scale systems, watch out for these three failures:

A. Cache Penetration

The Problem: A user keeps asking for a key that doesn’t exist in the DB (e.g., id=-1).
The Impact: The Cache misses, the DB misses. If 10,000 bots do this, your DB dies.
The Fix:
1. Cache Nulls: If DB returns null, cache that null for a short time (e.g., 30s).
2. Bloom Filters: A fast probabilistic check to see if id might exist before hitting the cache/DB.

B. Cache Avalanche

The Problem: Thousands of cache keys expire at the exact same time (e.g., you rebooted the server at midnight).
The Impact: Suddenly, requests fall through to the DB simultaneously.
The Fix: Add Jitter (Randomness) to TTL.
Bad: TTL = 3600s (Everyone expires at 1:00 PM).
Good: TTL = 3600s + random(0, 300s) (Expires spread between 1:00 PM and 1:05 PM).

C. Thundering Herd vs Cache Stampede

These terms are often used interchangeably, but there is a nuance:

Thundering Herd: Happens when many processes (clients) wake up simultaneously to compete for a resource (like a lock).
Cache Stampede: Specifically refers to the event where a Hot Key expires, and thousands of parallel requests hit the Database to recompute it.
The Impact: 10,000 users try to fetch “Justin Bieber’s Profile”. All get a “Miss”. All hit the DB.
The Fix:
1. Mutex Lock: Only allow ONE thread to rebuild the cache for that key. Others wait.
2. Refresh-Ahead: Update the cache in the background before it expires.

Deep Dive: Mutex Lock with Redis SETNX

The problem: When 1000 threads all get a cache miss simultaneously, they all try to recompute the expensive value and set it in cache. This creates a stampede on your database.

Solution: Use a distributed lock so only ONE thread recomputes. Others wait.

import redis
import time

cache = redis.Redis()

def get_with_mutex(key, ttl=60, lock_timeout=10):
  # Try cache first
  value = cache.get(key)
  if value:
    return value

  # Cache miss - try to acquire lock
  lock_key = f"lock:{key}"
  lock_acquired = cache.setnx(lock_key, "1")

  if lock_acquired:
    # WE won the race! Compute the value.
    cache.expire(lock_key, lock_timeout)  # Prevent deadlock

    try:
      # Expensive computation
      result = expensive_db_query(key)
      cache.setex(key, ttl, result)
      return result
    finally:
      # Release lock
      cache.delete(lock_key)
  else:
    # Someone else is computing. Wait and retry.
    time.sleep(0.1)
    return get_with_mutex(key, ttl, lock_timeout)

def expensive_db_query(key):
  # Simulate slow DB
  time.sleep(2)
  return f"Data for {key}"

How SETNX works:

SETNX = “SET if Not eXists”
Returns 1 if key didn’t exist (lock acquired)
Returns 0 if key exists (someone else has the lock)
Atomic operation: No race condition possible

Critical: Always set an expiration on the lock key (expire) to prevent deadlock if the thread crashes.

Alternative: Refresh-Ahead Pattern

Instead of waiting for expiration, refresh the cache proactively.

def get_with_refresh_ahead(key, ttl=60, refresh_threshold=0.8):
  value = cache.get(key)
  ttl_remaining = cache.ttl(key)

  if value:
    # Check if we're in the "danger zone"
    if ttl_remaining < (ttl * refresh_threshold):
      # Trigger async refresh (non-blocking)
      threading.Thread(
        target=async_refresh,
        args=(key, ttl)
      ).start()
    return value

  # Cache miss - compute synchronously
  result = expensive_db_query(key)
  cache.setex(key, ttl, result)
  return result

def async_refresh(key, ttl):
  result = expensive_db_query(key)
  cache.setex(key, ttl, result)

Trade-off:

[+] Pros: No stampede, users never wait
[!] Cons: Wasted computation if key is rarely used

Best Practice: Use refresh-ahead for hot keys (high traffic). Use mutex for normal keys.

[!WARNING] Pro Tip: Always set a TTL (Time To Live). A cache without expiration is just a memory leak waiting to happen.

Staff Engineer Tip: When caching, consider the Write Tax on SSDs. Frequent cache updates can contribute to Write Amplification, wearing out physical flash memory cells. If your cache has a very short TTL (e.g., 1 second) and high write volume, it might be better to keep it entirely in Volatile RAM (like Redis) rather than persisting it to a persistent block store.

Mnemonic for the 3 Demons — “PAS”: Penetration (fake keys → fix with Bloom Filter or cache nulls), Avalanche (mass expiry → fix with TTL Jitter), Stampede (hot key expires → fix with Mutex/Refresh-Ahead). If you can recite PAS fixes from memory, you’ve nailed the hardest caching question in any system design interview.

Caching Strategies

Caching Strategies

1. The “Open Book Exam” Analogy

2. Why Caching Works: Latency Numbers

The Latency Ladder

Effective Latency Calculation

Effective Latency Calculator

3. The Hardware Reality: Cache Locality

The 2D Array Trap

Result: Row-Major is significantly faster because it hits the CPU Cache line.

4. Where to Cache? (Layers of Caching)

5. Implementation: Cache-Aside vs Read-Through

A. Cache-Aside (Lazy Loading) - Most Common

B. Read-Through

Python Decorator Example (Cache-Aside)

6. Interactive Demo: Cache Hit vs Miss & DB Load

7. The Three Cache Demons

A. Cache Penetration

B. Cache Avalanche

C. Thundering Herd vs Cache Stampede

Deep Dive: Mutex Lock with Redis SETNX

Alternative: Refresh-Ahead Pattern

Found this lesson helpful?