Case Study: The YouTube View Counter
In December 2012, “Gangnam Style” by Psy broke YouTube’s view counter. The video hit 2,147,483,647 views — the maximum value of a 32-bit signed integer. YouTube’s counter literally overflowed and rolled over to negative numbers. Engineers had to hotfix the system while it was live, migrating to a 64-bit integer without any downtime. The problem wasn’t storage — it was architecture: a single database row with a view_count column that every view tried to UPDATE ... SET view_count = view_count + 1. At 50,000 views/second, every API call was fighting for the same row lock. This is the “Hot Key” problem. The solution isn’t a bigger row lock — it’s rethinking the entire counter architecture.
[!IMPORTANT] In this lesson, you will master (PEDALS framework):
- The Hot Key Problem: Why a single Redis key or DB row can become the bottleneck for your entire system.
- CRDT Mechanisms: Using G-Counters and PN-Counters to achieve eventual consistency without central locks.
- Counter Sharding: Splitting one counter into N sub-counters to distribute writes across independent resources.
- Write-Behind Caching: Absorbing 300k RPS with local buffers and flushing to Redis every 5 seconds.
Note: This case study follows the PEDALS framework and is optimized for interview preparation.
1. Process (Clarification)
Problem: Design a Distributed Counter service that tracks Video Views at massive scale. Real-World Example: “Despacito” on YouTube has 8 Billion views. During its peak, it was receiving > 50,000 views/second. The Challenge: A standard database row cannot handle 50k updates/second due to lock contention.
2. Estimate (Capacity)
- DAU: 1 Billion Users.
- Average Views: 5 videos / user / day = 5 Billion views / day.
- QPS (Average): 5 × 109 / 86400 ≈ 58,000 writes/sec.
- Peak QPS: Assume 5x multiplier = 300,000 writes/sec.
- Storage:
- Video ID (8 bytes) + Count (8 bytes) = 16 bytes per video.
- 1 Billion Videos = 109 × 16 bytes ≈ 16 GB (Hot data fits in Memory).
3. Design (High-Level)
The architecture splits the “Write Path” (High Throughput) from the “Read Path” (High Availability).
- Client sends
incrementrequest to Load Balancer. - API Service receives request.
- Fast Path: API Service increments a counter in Redis Cluster.
- Slow Path: A background worker (“The flusher”) reads from Redis and updates the SQL Database every N seconds.
4. Articulate (Deep Dive)
The “Hot Key” Problem
The naive approach uses a simple Redis INCR:
INCR video:123:views
Problem: Redis is single-threaded. A single key lives on a single Redis node. That node can handle ~50k - 100k RPS. If “Despacito” gets 300k RPS, the single Redis node will melt (CPU 100%).
The “Elite” Pattern: CRDT counters
For multi-region deployments, sharding isn’t enough. You need CRDTs (Conflict-free Replicated Data Types).
- G-Counter (Grow-only): Each node maintains its own counter. To merge, you take the
MAX(local, remote)for each node’s entry and sum them. This handles concurrent updates across the globe without a central lock. - PN-Counter (Positive-Negative): Similar to G-Counter but allows decrements by using two internal G-Counters (increments and decrements).
Solution: Counter Sharding
We split the single counter into N sub-counters.
- Key:
video:123:views→ Shards:video:123:views:0…video:123:views:9 - Write: Pick a random shard
iandINCR. - Read:
GETall shards and sum them up.
# Write Path (O(1))
shard_id = random.randint(0, N-1)
redis.incr(f"video:{vid}:views:{shard_id}")
# Read Path (O(N))
total = 0
for i in range(N):
total += int(redis.get(f"video:{vid}:views:{i}") or 0)
return total
Write-Behind (Buffering)
For 300k RPS, even sharding Redis might be expensive. Strategy:
- Local Buffer: Each API Server keeps a local
Map<VideoID, Count>. - Batch: It accumulates writes for 5 seconds.
- Flush: Every 5 seconds, it sends one
INCRBYcommand to Redis with the aggregated value (e.g., +50).
5. List (Trade-offs) & Interactive Decision Visualizer
Compare Global Lock vs Sharded Counters vs Write-Behind. Monitor the System Health (Error Rate & Latency).
6. Scale (Summary)
CORE ARCHITECTURE
- Client → LB → API
- API → Redis (Sharded)
- Worker → SQL (Async)
KEY TRADE-OFFS
- **Write-Behind**: Speed vs Safety.
- **Sharding**: Scale vs Complexity.
- **Redis**: Latency vs Cost.
Staff Engineer Tip: The Atomic CPU Trick. On a single physical machine, for ultra-high performance counters (like within a Load Balancer’s metrics), don’t use locks at all. Use CPU Atomic Atomics (like LOCK XADD on x86). This performs the increment directly in the L1/L2 cache of the CPU, bypassing the kernel scheduler entirely. At the scale of 50M ops/sec, the “Lock Contention” isn’t in your software — it’s in the Cache Coherency Protocol (MESI) on the CPU bus.
Mnemonic — “Count = Shard + Buffer + Flush”: Hot Key? Shard the counter (N random sub-keys). Too high write rate? Buffer locally (Write-Behind, 5s flush). Too many shards to sum? Cache the sum in Redis with 60s TTL. Multi-region? Use G-Counter CRDTs to avoid global locks. Scalability ladder: Single Row → Redis INCR → Sharded Counter → Write-Behind Buffer → Distributed CRDT.