Design High-Concurrency Flash Sales
[!NOTE] This scenario is common for Wise’s internal investment or share-purchase events. It tests your ability to handle race conditions, distributed invariants, and load shedding.
1. The Problem: The “Sell Out” Race
Wise launches a feature where users can buy shares in a fund. There are only 10,000 shares available. 50,000 users click “Buy” at the exact same second. They all want to pay using their Wise EUR balance.
The Interview Challenge:
“Design a system to sell 10,000 shares. You must guarantee that we never sell more than 10,000 (no overselling) and that we never charge a user if the shares are already gone (no overcharging).”
2. Requirements & Goals
Functional Requirements
- Atomic Purchase: Reserve a share and debit the balance as one logical unit.
- Inventory Count: Real-time visibility of remaining stock.
- Waitlist: (Optional) Handle overflow users.
Non-Functional Requirements
- Strict Consistency: Inventory count must be perfect. No eventual consistency here.
- High Availability: The “Check Balance” service must not crash under the 50k TPS load.
- Latency: Sub-second response to give users a “Fair” experience.
3. Capacity Estimation
- Total Stock: 10,000.
- Peak Load: 50,000 Req / Second.
- Bottleneck: Most likely the Balance Service (SQL database with row-level locks).
4. High-Level Design: The Buffer Pattern
We cannot let 50,000 users hit the Balance DB at once. We need a Buffer (Queue) and a Reservation system.
flowchart TD
U[User] -->|POST /buy| API[Purchase API]
API -->|1. Check Stock| INV[(Redis Inventory)]
INV -->|2. Reserve| INV
API -->|3. Queue Req| Q[Purchase Queue]
Q --> WORKER[Settlement Worker]
WORKER -->|4. Debit| BAL[Balance Service]
WORKER -->|5. Confirm| INV
5. Detailed Design: Reservation-Hold-Capture
To prevent the “I paid but didn’t get it” disaster, we use a three-phase approach.
Phase 1: The Redis Reservation (The “Fast” Check)
We store the inventory in Redis because it can handle 100k+ operations per second.
- Command:
DECRBY shares:1 1 - Logic: If the result is
< 0, the sale is over. Immediately tell the user “Sold Out”. - Benefit: This sheds 80% of the load before it hits our expensive SQL databases.
Phase 2: The Balance Hold
For the 10,000 users who got a Redis reservation, we now need to check their bank balance. We don’t deduct yet. We HOLD.
- Action: Create a “HOLD” entry in the ledger. This money is now “Unavailable” to the user for other transactions but hasn’t left their account.
Phase 3: The Capture
Once the balance is successfully held, we finalize the share purchase.
- If Success: Change
HOLDtoDEBITand incrementshares_soldin the SQL DB. - If Failure (e.g., user had insufficient funds): Release the Redis reservation (
INCRBY shares:1 1) so someone else can buy it.
6. Deep Dive: Protecting the Balance Service
In a Wise interview, the interviewer will ask: “Your Balance service is slow. How do you protect it?”
- Queue Partitioning: Partition the
Purchase Queuebyuser_id. This ensures that one user’s retries don’t block other users’ first attempts. - Token Bucket: Use a rate limiter to ensure the
Settlement Workeronly sends 500 requests per second to the Balance Service, even if the queue has 10,000 items. - Local Locking: Use
SELECT ... FOR UPDATEin Postgres to ensure one user can’t perform two simultaneous purchases that exceed their balance.
7. Reliability: What if Redis Crashes?
Redis is “In-Memory”. If it restarts, we lose our “Reserved” count.
- Solution: Redis AOF (Append Only File) with
fsync everysec. - Recovery: On boot, the Inventory Service should query the SQL DB for
COUNT(successful_purchases)andCOUNT(active_holds)to recalibrate the Redis counter.
8. Summary: The Senior Interview Checklist
- Distributed Invariant: “How do you guarantee exactly 10,000?” (Explain the atomic Redis
DECR+ SQL reconciliation). - User Fairness: “How do you handle users who have a slow internet connection?” (Talk about the “Queue” and how it acts as a buffer).
- Race Conditions: “What if two workers try to fulfill the last share?” (Use a
UNIQUEconstraint onshares.share_numberin the DB). - Load Shedding: Discuss returning
429 Too Many Requestsat the API Gateway for the 50,001st user.