Design High-Concurrency Flash Sales

[!IMPORTANT] In this lesson, you will master:

  1. The Reservation-Hold-Capture Pattern: Decoupling high-frequency stock checks from heavy financial transactions.
  2. Load Shedding: How to gracefully reject $90\%$ of traffic to protect the core banking database.
  3. Redis Atomic Decr: Using in-memory synchronization to solve the “Overselling” problem.

1. Scene Setting: The 10k Share Race

Wise launches a feature where users can buy shares in a fund. There are only $10,000$ shares available. $100,000$ users click “Buy” at the exact same second. This is the “Thundering Herd” problem in its purest financial form.

⚖️ Clarifying Questions (The “Senior” Probe)

Before drawing a single box, define the boundaries:

  • Interviewer: “Design a system to sell 10k shares to 100k concurrent users.”
  • Candidate: “Is it ‘First-come, first-served’ or a ‘Lottery’? (i.e., Do we care about absolute time order?)”
  • Interviewer: “First-come, first-served.”
  • Candidate: “Can a user buy multiple shares, or just one?”
  • Interviewer: “Max 50 shares per user.”
  • Candidate: “Does the user need an existing balance in Wise, or can they pay via credit card during the checkout?”
  • Interviewer: “They must have existing balance. We need to debit their Wise wallet.”
  • Candidate: “And the most important constraint: Zero Overselling. We cannot sell 10,001 shares under any circumstance.”

2. Requirements & Constraints

2.1 Functional Requirements

  1. Atomic Reservations: Users can reserve one unit of stock. The reservation must be atomic to prevent overselling.
  2. Hold-Capture Lifecycle: Reservations must expire after 10 minutes if the payment is not completed.
  3. Fair Queueing: First-come, first-served (FCFS) logic must be maintained under heavy load.
  4. Bot Mitigation: Identify and block automated scripts attempting to scrape all inventory.
  5. Inventory Recovery: Once a reservation expires or a payment fails, stock must be returned to the pool immediately.

2.2 Non-Functional Requirements

  1. Scalability: Handle 100,000 requests/second (RPS) during the “Gold Rush” (first 60 seconds).
  2. Latency: P99 response time must be < 50ms to prevent UI sluggishness during high contention.
  3. Correctness: Zero Overselling. Selling 1,001 items when you have 1,000 is a manual reconciliation nightmare and a brand failure.

2.3 Compliance & Fairness

  • Anti-Scalping: Limit items per user account/IP address.
  • Audit Log: Every successful reservation and “Out of Stock” rejection must be logged for post-mortem analysis.

3. Capacity Planning & Estimation (The Gold Rush)

3.1 Throughput Analysis

  • Traffic: 1,000,000 users waiting for a “Drop.”
  • Execution Window: 90% of requests arrive in the first 5 seconds.
  • Peak Load: 1,000,000 / 5s = 200,000 RPS.
  • The DB Challenge: A standard relational database (MySQL/Postgres) will hit lock contention at ~5,000 parallel writes on the same row (the inventory count).

3.2 Memory & IOPS

  • State Size: 1,000,000 users × 100 bytes (session/lock info) ≈ 100 MB.
  • Conclusion: The entire inventory state fits in Redis RAM.
  • Write IOPS: 200,000 RPS means we need a store that can handle 200k atomic decrements. Redis (single-threaded) caps out at ~100k-150k.
  • The Lead-Level Solution: Do not shard inventory. Sharding (e.g., placing 2,500 stock on 4 nodes) introduces massive complexity. When Node A sells out but Node B has 500 left, you must dynamically route users, destroying throughput. Instead, use Load Shedding at the API Gateway. Drop 80% of the traffic immediately with a 503 “Too Many Requests”, allowing the remaining 20% to hit a single Redis instance comfortably under the 100k RPS limit. Keep state unified.

4. Architecture Comparison: Choosing the Pattern

Alternative 1: Direct SQL Locking (SELECT FOR UPDATE)

All 100k requests hit the DB. The first 10k get the lock, the rest wait.

  • Result: The DB connection pool exhausts in <1s. Latency spikes to 30s. The system crashes. Verdict: FAILED.

Alternative 2: Distributed Lock (Redis/ZooKeeper)

Acquire a lock per share.

  • Result: Better, but still too much network overhead to acquire/release 10,000 individual locks at $10^{5}$ TPS. Verdict: SUBOPTIMAL.

Use Redis for the “Reservation” (Fast) and Kafka for “Settlement” (Durability).

Feature Alt 1 (SQL) Alt 3 (Recommended)
Throughput ~1k TPS ~100k TPS
Consistency Perfect Guaranteed (Eventually)
User Experience Frequent Timeouts Instant Feedback
flowchart TD
  U[User] -->|POST /buy| API[Purchase API]
  API -->|1. Lua Script| INV[(Redis Inventory)]
  INV -->|Success: -Qty| API
  API -->|2. Queue Req| Q[Kafka: SettlementQueue]
  Q --> WORKER[Settlement Worker]
  WORKER -->|3. ACID Debit| BAL[(Balance DB: Postgres)]
  WORKER -->|4. Update Status| BAL
  
  style INV fill:var(--bg-card),stroke:var(--accent-main),stroke-width:2px

5. Technical Depth: Atomic Operations & Bots

5.1 The “Atomic Decrement” Lua Script & The “Double Click” Problem

Under heavy load, users panic and click “Buy” 50 times. You cannot allow 1 user to drain 50 stock. To prevent “Read-Modify-Write” race conditions and guarantee idempotency, we use a Redis Lua script. Lua scripts are executed atomically within Redis.

-- KEYS[1]: stock_key, KEYS[2]: purchased_users_set
-- ARGV[1]: user_id, ARGV[2]: requested_qty

-- 1. Idempotency Check (Has user already bought?)
if redis.call('SISMEMBER', KEYS[2], ARGV[1]) == 1 then
    return -1 -- Already Purchased
end

-- 2. Stock Check
local stock = tonumber(redis.call('GET', KEYS[1]))
local requested = tonumber(ARGV[2])

if stock >= requested then
    -- 3. Atomic Decrement and Record User
    redis.call('DECRBY', KEYS[1], requested)
    redis.call('SADD', KEYS[2], ARGV[1])

    -- Create a temporary reservation lock for the user
    redis.call('SET', 'res:' .. ARGV[1], '1', 'EX', 600) 
    return 1 -- Success
else
    return 0 -- Out of Stock
end

5.2 Bot Mitigation: The “Virtual Waiting Room”

Before hitting the API, users pass through a WAF (Web Application Firewall) or a “Waiting Room” (like Cloudflare Waiting Room).

  • Proof of Work (PoW): Require the client to solve a small cryptographic puzzle in JS before submitting the request. This slows down bots significantly while remaining transparent to humans.
  • Rate Limiting by IP/User: Aggressive 429s for anyone hitting the “Check Stock” endpoint more than once every 2 seconds.

5.3 Overload Protection: Load Shedding

When the system detects CPU/Memory hitting 90%, it must protect the “Checkout” flow by sacrificing non-essential features.

  • Technical Implementation: The API Gateway returns a static 200 OK (with “Feature Temporarily Disabled”) for the Reviews, Recommendations, and Search services, giving all resources to the Inventory and Payment services.

6. Logic: High-Concurrency Safeguards

Phase 1: The “Front Door” (Edge)

The API checks Redis. If the Lua script returns 0, the user is told “Sold Out” in <20ms. We have just protected our database from 90,000 useless requests.

Phase 2: The Settlement Queue

For the 10,000 “Winners,” we produce a message to Kafka.

  • Payload: { user_id: 123, qty: 5, quote_id: "q_456", instrument_id: "AAPL" }.
  • Partitioning for Strict Ordering: We use the instrument_id (the stock ticker) as the Kafka Partition Key. This guarantees that every single buy request for “AAPL” lands in the exact same partition. Since a Kafka partition is consumed sequentially by a single worker thread, we completely eliminate database row contention and deadlocks on the AAPL inventory row during the settlement phase.

Phase 3: The Heavy Lifter (Settlement Worker)

The worker pulls from Kafka at a controlled speed (e.g., 500 msgs/sec).

  • It performs the ACID Debit:

High-Concurrency Purchase Flow

Simulate 1,000 users trying to buy 50 remaining items simultaneously.

Incoming Requests (100k TPS)

Redis (Fast)

Available Stock: 50
Atomic DECR Lua Script

Kafka Queue

Pending Settlements: 0

PostgreSQL (Slow/Durable)

Settled Transactions: 0
1k TPS Limit

[!TIP] Staff Engineer Insight: Why Lua? A standard GET followed by a SET in an application is NOT atomic. Two servers could both see stock=1, then both perform DECR. You end up with -1 stock. Lua scripts in Redis run sequentially and block all other commands, guaranteeing atomicity.


7. Real-World Complexity: The “Ghost Inventory” Problem

What if a user reserves an item (Redis DECR) but their internet cuts out before they pay?

  • The Leak: 100 people “Reserve” items but never “Settle”. We have 0 stock in Redis, but 100 items actually sitting in the warehouse.
  • The Lead-Level Solution: Do not build a “Janitor” cron job to scan 10,000 reservations. Scanning is inefficient and creates massive race conditions (what if payment settles at minute 10:01 while the Janitor restores stock at minute 10:00?).
    1. Instead, use Delay Queues (e.g., Kafka Dead Lettering or SQS Delay Seconds).
    2. When a user gets a reservation, publish a message to a Hold_Verification topic with a 10-minute delay.
    3. Exactly 10 minutes later, a consumer reads the message and checks the Postgres DB: “Did User 123 complete payment for Item 456?”
    4. If NO: The consumer fires a Redis INCR to return the stock and marks the reservation as cancelled. No scanning required.

🛠️ Operational Failure Modes (Playbooks)

Scenario A: Redis Master Failure

  • Problem: The Redis Master crashes mid-sale. The Slave is promoted, but because of async replication, it might have missed the last 20 DECR commands.
  • Playbook:
    1. Use Wait command or Strong Replication if the business risk of 20 items is too high.
    2. Alternatively, accept the “Slight Oversell” risk and handle it at the Settlement layer (Refund the user). In 10k items, selling 10,020 is often better than having the whole system die.

Scenario B: Botanical Warfare (Bot Networks)

  • Problem: Bots grab all 10k items in 15ms.
  • Playbook:
    1. Proof of Work: Challenge the client-side with a minor hash puzzle before allowing the “Reserve” request.
    2. KYC Tiers: Only allow users with a “Verified” account status (linked to Ch03) to participate.

8. Missing Crucial Details (The Grab/Ticketmaster Edge Cases)

Flash sales aren’t just about databases; they are about surviving malicious traffic.

A. Advanced Bot Mitigation (The WAF)

A hash puzzle is not enough for modern bots. You must specify edge protection:

  • The Web Application Firewall (WAF): Cloudflare or AWS WAF must sit in front of the API. It uses IP reputation, behavioral analysis, and rate limiting to block scripted traffic before it hits your infrastructure.

B. CDN Pre-Warming

During a flash sale, 100,000 users will be mashing “Refresh” on the product page.

  • If that page hits your Postgres database, you die before the sale even starts. The product page, images, and the “Buy Button” state must be statically cached in the CDN.

🔗 Cross-Chapter Cohesion: The Integrated Wise Stack

In a real system, these chapters aren’t isolated.

  • Identity (from Ch03): The user hitting the Flash Sale must have a valid user_id and have passed KYC.
  • Pricing (from Ch02): If the flash sale is for a discounted “International Transfer”, the system must fetch the rate_snapshot_id from the Exchange Rate service before finalizing the settlement.

💡 Hardware-First Intuition: RAM vs. Disk

  • Redis (Sequential RAM): Like a narrow hallway. People enter one by one, but they run at $300$ mph ($0.1$ ms).
  • Postgres (Parallel Disk): Like a wide highway. Many cars can go at once, but they hit red lights (Disk I/O) every 5 miles ($10$ ms).

The Strategy: Use the high-speed hallway (Redis) to “Thin the Herd” so the slow durable highway (Postgres) never sees more than it can handle.


9. Interview Pacing & Milestone Guide

Time Task Key Talking Points
0-10m Reqs & Estimates Define the “Herd Problem”. Identify bottlenecks (DB Locks).
10-25m High Level Design Introduce the Redis + Kafka decoupled architecture.
25-40m Deep Dive Write the Lua script. Explain atomic execution in Redis.
40-50m Compensation Handle payment failures (NSF) with “Revert” logic.
50-60m Scale & Failure Discuss Bot protection, Redis failover, and exact-once settlement.

10. Summary: Senior Interview Checklist

Prove your seniority by proactively addressing these:

  • Load Shedding: Discuss how to drop requests at the API Gateway if the Kafka partition lag reaches a “Drown” threshold (e.g., > 100,000 messages).
  • Exactly-Once Settlement: Each message in Kafka must have a purchase_id. The Settlement worker uses this as a DB unique constraint so a retry won’t charge the user twice.
  • Idempotent Reverts: If a user has no money, the script that increments Redis stock back must also be idempotent.
  • Redis Persistence: Enable AOF (Append Only File) so the stock count survives a Redis restart.

11. Follow-up Interview Questions

  1. “Why use a Redis Lua Script instead of just INCR or DECR?” Answer: If we just use GET to check stock, then DECR if > 0, we suffer a race condition. Two threads might GET 1, and both will DECR, resulting in -1 stock (an oversell). Lua scripts in Redis are guaranteed to execute atomically, ensuring no other commands run between the GET and DECR.

  2. “What if the user’s payment fails during the settlement phase?” Answer: The Settlement Worker detects the failure (e.g., Not Sufficient Funds). It writes a FAILED status to the database, and then pushes a RefundStock event back to Kafka. Another worker consumes this, calls a Lua script to INCR the Redis stock back by 1, and marks the user’s attempt as canceled.

  3. “How do you ensure you never, ever oversell, even if Redis crashes?” Answer: Redis is an in-memory store; absolute 100.00% consistency across crashes is impossible without synchronous disk writes (which kills the 100k TPS speed we need). The ultimate source of truth is the SQL Database. We can use a CHECK (stock >= 0) constraint in Postgres. If Redis accidentally lets 11,000 requests through for 10,000 items, the first 10,000 succeed in Postgres, and the remaining 1,000 fail the constraint. We simply notify those 1,000 users of the failure.

  4. “If we use Kafka, won’t the user be waiting a long time to know if they got the item?” Answer: The Redis Lua script returns Success or Sold Out in <5ms. We immediately show the user “Reserved! Processing Payment…” on the UI. The actual settlement via Kafka might take 1-3 seconds. The UI can long-poll or use WebSockets to listen for the final SETTLED status.

  5. “How do you partition the Kafka SettlementQueue to handle the thundering herd asynchronously while maintaining order and fairness?” Answer: We partition the SettlementQueue by user_id, not item_id. This is a common trap. If you partition by item_id, all 10,000 winners hit a single partition, meaning only a single consumer processes payments. If payment settlement takes 1 second, it takes 3 hours for the last user to check out. You do not need strict FCFS ordering in the payment phase because the Redis Lua script already guaranteed the reservation lock. Partitioning by user_id lets you spin up 50 consumers and process all 10,000 payments in a few minutes.