Case Study: The YouTube View Counter

In December 2012, “Gangnam Style” by Psy broke YouTube’s view counter. The video hit 2,147,483,647 views — the maximum value of a 32-bit signed integer. YouTube’s counter literally overflowed and rolled over to negative numbers. Engineers had to hotfix the system while it was live, migrating to a 64-bit integer without any downtime. The problem wasn’t storage — it was architecture: a single database row with a view_count column that every view tried to UPDATE ... SET view_count = view_count + 1. At 50,000 views/second, every API call was fighting for the same row lock. This is the “Hot Key” problem. The solution isn’t a bigger row lock — it’s rethinking the entire counter architecture.

[!IMPORTANT] In this lesson, you will master (PEDALS framework):

  1. The Hot Key Problem: Why a single Redis key or DB row can become the bottleneck for your entire system.
  2. CRDT Mechanisms: Using G-Counters and PN-Counters to achieve eventual consistency without central locks.
  3. Counter Sharding: Splitting one counter into N sub-counters to distribute writes across independent resources.
  4. Write-Behind Caching: Absorbing 300k RPS with local buffers and flushing to Redis every 5 seconds.

Note: This case study follows the PEDALS framework and is optimized for interview preparation.

1. Process (Clarification)

Problem: Design a Distributed Counter service that tracks Video Views at massive scale. Real-World Example: “Despacito” on YouTube has 8 Billion views. During its peak, it was receiving > 50,000 views/second. The Challenge: A standard database row cannot handle 50k updates/second due to lock contention.

2. Estimate (Capacity)

  • DAU: 1 Billion Users.
  • Average Views: 5 videos / user / day = 5 Billion views / day.
  • QPS (Average): 5 × 109 / 86400 ≈ 58,000 writes/sec.
  • Peak QPS: Assume 5x multiplier = 300,000 writes/sec.
  • Storage:
  • Video ID (8 bytes) + Count (8 bytes) = 16 bytes per video.
  • 1 Billion Videos = 109 × 16 bytes ≈ 16 GB (Hot data fits in Memory).

3. Design (High-Level)

The architecture splits the “Write Path” (High Throughput) from the “Read Path” (High Availability).

Client Load Balancer API Service Redis Cluster (Sharded Counters) Flush Worker SQL DB INCR (Hot) Async Read Persist
  1. Client sends increment request to Load Balancer.
  2. API Service receives request.
  3. Fast Path: API Service increments a counter in Redis Cluster.
  4. Slow Path: A background worker (“The flusher”) reads from Redis and updates the SQL Database every N seconds.

4. Articulate (Deep Dive)

The “Hot Key” Problem

The naive approach uses a simple Redis INCR:

INCR video:123:views

Problem: Redis is single-threaded. A single key lives on a single Redis node. That node can handle ~50k - 100k RPS. If “Despacito” gets 300k RPS, the single Redis node will melt (CPU 100%).

The “Elite” Pattern: CRDT counters

For multi-region deployments, sharding isn’t enough. You need CRDTs (Conflict-free Replicated Data Types).

  • G-Counter (Grow-only): Each node maintains its own counter. To merge, you take the MAX(local, remote) for each node’s entry and sum them. This handles concurrent updates across the globe without a central lock.
  • PN-Counter (Positive-Negative): Similar to G-Counter but allows decrements by using two internal G-Counters (increments and decrements).

Solution: Counter Sharding

We split the single counter into N sub-counters.

  • Key: video:123:views → Shards: video:123:views:0video:123:views:9
  • Write: Pick a random shard i and INCR.
  • Read: GET all shards and sum them up.
# Write Path (O(1))
shard_id = random.randint(0, N-1)
redis.incr(f"video:{vid}:views:{shard_id}")

# Read Path (O(N))
total = 0
for i in range(N):
  total += int(redis.get(f"video:{vid}:views:{i}") or 0)
return total

Write-Behind (Buffering)

For 300k RPS, even sharding Redis might be expensive. Strategy:

  1. Local Buffer: Each API Server keeps a local Map<VideoID, Count>.
  2. Batch: It accumulates writes for 5 seconds.
  3. Flush: Every 5 seconds, it sends one INCRBY command to Redis with the aggregated value (e.g., +50).

5. List (Trade-offs) & Interactive Decision Visualizer

Compare Global Lock vs Sharded Counters vs Write-Behind. Monitor the System Health (Error Rate & Latency).

TRAFFIC
STRATEGY
DB ROW LOCK
IDLE
0
Processed: 0
Time: 0.0s
RPS: 0
Errors: 0

6. Scale (Summary)

CORE ARCHITECTURE

  • Client → LB → API
  • API → Redis (Sharded)
  • Worker → SQL (Async)

KEY TRADE-OFFS

  • **Write-Behind**: Speed vs Safety.
  • **Sharding**: Scale vs Complexity.
  • **Redis**: Latency vs Cost.

Staff Engineer Tip: The Atomic CPU Trick. On a single physical machine, for ultra-high performance counters (like within a Load Balancer’s metrics), don’t use locks at all. Use CPU Atomic Atomics (like LOCK XADD on x86). This performs the increment directly in the L1/L2 cache of the CPU, bypassing the kernel scheduler entirely. At the scale of 50M ops/sec, the “Lock Contention” isn’t in your software — it’s in the Cache Coherency Protocol (MESI) on the CPU bus.

Mnemonic — “Count = Shard + Buffer + Flush”: Hot Key? Shard the counter (N random sub-keys). Too high write rate? Buffer locally (Write-Behind, 5s flush). Too many shards to sum? Cache the sum in Redis with 60s TTL. Multi-region? Use G-Counter CRDTs to avoid global locks. Scalability ladder: Single Row → Redis INCR → Sharded Counter → Write-Behind Buffer → Distributed CRDT.