Write Strategies: Consistency vs Latency

[!TIP] Interview Insight: When designing a system, always ask: “Is this a read-heavy or write-heavy system?” Your choice of write strategy depends entirely on this answer.

1. The Write Problem

Reading from a cache is simple: check cache, if miss, check DB. Writing is harder. You have two copies of data (Cache and DB). How do you keep them in sync?


2. The Four Strategies

A. Write-Through (The “Safe” Way)

  • How it works: The application writes to the Cache AND the Database synchronously. The write is confirmed only when both succeed.
  • Pros:
    • Strong Consistency: Cache and DB are always identical.
    • Reliability: No data loss if cache crashes.
  • Cons:
    • High Latency: You pay the penalty of the DB write for every single request.
    • Double Write: Every write hits the DB, so it doesn’t reduce write load.

B. Write-Back / Write-Behind (The “Fast” Way)

  • Terminology: These terms are often used interchangeably. Write-Back usually refers to the CPU/Hardware cache policy, while Write-Behind refers to software (Database/Redis) patterns.
  • How it works: The application writes only to the Cache. The Cache returns “Success” immediately. The Cache asynchronously syncs data to the DB later (e.g., every 5 seconds, or when the item is evicted).
  • Pros:
    • Low Latency: Write speed = RAM speed (~100ns).
    • Write Coalescing: If you update a counter 100 times in 1 second, the DB only sees ONE write (the final value). This massively reduces DB pressure.
  • Cons:
    • Data Loss Risk: If the Cache crashes before syncing, that data is gone forever.
    • Complexity: Implementing this correctly is hard (requires a queue or WAL).

C. Write-Around (The “Big Data” Way)

  • How it works: Write directly to the DB, bypassing the cache. The cache is only populated when the data is read (on a Miss).
  • Use Case: Writing massive data (e.g., Log files, Video uploads) that won’t be read immediately. Prevents the cache from being flooded with useless data (Cache Pollution).
  • Trade-off: Read latency for recently written data is high (Cache Miss).

D. Refresh-Ahead

  • How it works: If a cached item is accessed and is close to expiring (e.g., within 10 seconds of TTL), the cache automatically refreshes the data from the DB in the background.
  • Benefit: The next user never sees a cache miss or latency spike. Excellent for preventing Thundering Herd.

Visual Data Flow

Write-Through

📱
Cache
+
DB
Wait for BOTH

Write-Back

📱
Cache (ACK)
DB (Async)
Fast ACK, Lazy Sync

Write-Around

📱
Skip Cache
DB Only
Good for Logs

3. Decision Tree: Which one to choose?

Is Data Loss Acceptable?

Can you afford to lose the last 5 seconds of data if the server crashes?

Is the system Write-Heavy?

Do you have thousands of writes per second?
🛡️

Write-Through

Strong Consistency. Zero Data Loss. Slower writes but safe for financial data.

Write-Back

Maximum Performance. Uses Write Coalescing to reduce DB load. Risk of data loss on crash.
📦

Write-Around

Avoids Cache Pollution. Writes go directly to DB. Good for large files or logs that are rarely read immediately.

4. Interactive Demo: The Power of Write Coalescing

This demo visualizes the massive advantage of Write-Back: Coalescing.

  1. Select Write-Back.
  2. Rapidly click WRITE (+1). Notice the “Pending Updates” count increases in RAM.
  3. Observe that the Database is NOT touched for every click.
  4. Wait for the Async Flush to see one single big write to the DB.
  5. Try Write-Through to see the pain of slow DB writes.
💻 Application
⚡ Cache (RAM) Value: 0 (Pending: 0)
💾 Database (Disk) Value: 0

5. Cache Warming: The Cold Start Problem

When you deploy a new cache server (or restart an existing one), the cache is empty. Every request is a MISS, and your database gets hammered. This is called a Cold Start.

Strategies

A. Lazy Loading (Default - Reactive)

  • How: Cache is empty. Wait for users to request data. Populate cache on MISS.
  • Simple: No special code needed
  • Slow Start: First users experience high latency
  • DB Spike: Database gets hit hard during cold start

B. Eager Loading (Proactive)

  • How: Preload the cache with predictable data before taking traffic
  • Fast Start: Users get instant cache hits
  • Complexity: Requires knowing what to preload
  • Wasted Memory: May load data that never gets accessed

Implementation Example

Bulk Warming Script (Python + Redis):

import redis
import json
from concurrent.futures import ThreadPoolExecutor

cache = redis.Redis()
db = get_database_connection()  # Your DB

def warm_user_profiles():
    """Load top 10K active users into cache"""
    users = db.execute("""
        SELECT id, name, avatar 
        FROM users 
        ORDER BY last_active DESC 
        LIMIT 10000
    """)
    
    # Batch insert with pipeline (reduces RTT)
    pipe = cache.pipeline()
    for user in users:
        key = f"user:{user['id']}"
        pipe.setex(key, 3600, json.dumps(user))  # 1hr TTL
    pipe.execute()
    print("✅ Warmed 10K user profiles")

def warm_product_catalog():
    """Load entire product catalog (static data)"""
    products = db.execute("SELECT * FROM products WHERE active = true")
    
    pipe = cache.pipeline()
    for product in products:
        key = f"product:{product['id']}"
        pipe.set(key, json.dumps(product))  # No TTL (static)
    pipe.execute()
    print("✅ Warmed product catalog")

# Run in parallel
with ThreadPoolExecutor(max_workers=5) as executor:
    executor.submit(warm_user_profiles)
    executor.submit(warm_product_catalog)

Production Deployment Pattern

Blue-Green Deployment with Cache Warming:

1. Deploy new server (Green)
2. Run warming script on Green (while still OFF traffic)
3. Wait for cache to populate (monitor hit ratio)
4. Flip traffic from Blue → Green
5. Keep Blue alive for 5min (rollback safety)
6. Decommission Blue

Validation:

# Check cache hit ratio before going live
redis-cli INFO stats | grep keyspace_hits

# Target: >80% hit ratio before production traffic

What to Warm?

Data Type Strategy Example
User Sessions Don’t warm (short-lived) Login tokens expire quickly
Product Catalog Warm fully Static data, predictable access
Top Users Warm top 1% Celebrities, power users (80/20 rule)
Homepage Content Warm fully Everyone sees this
Old Articles Don’t warm Low traffic, unpredictable

Interview Insight: Facebook warms their edge caches with “trending posts” before routing traffic. This prevents cold-start slowness during global events.


6. Summary

Strategy Latency Data Safety Best For…
Write-Through High (Slow) High Financial Data, User Profiles (Consistency Critical)
Write-Back Low (Fast) Low (Risk) Likes, Views, Analytics, Heavy Write Loads
Write-Around High (Slow) High Archival Data, Large Media Uploads