Design High-Concurrency Flash Sales

[!NOTE] This scenario is common for Wise’s internal investment or share-purchase events. It tests your ability to handle race conditions, distributed invariants, and load shedding.

1. The Problem: The “Sell Out” Race

Wise launches a feature where users can buy shares in a fund. There are only 10,000 shares available. 50,000 users click “Buy” at the exact same second. They all want to pay using their Wise EUR balance.

The Interview Challenge:

“Design a system to sell 10,000 shares. You must guarantee that we never sell more than 10,000 (no overselling) and that we never charge a user if the shares are already gone (no overcharging).”


2. Requirements & Goals

Functional Requirements

  1. Atomic Purchase: Reserve a share and debit the balance as one logical unit.
  2. Inventory Count: Real-time visibility of remaining stock.
  3. Waitlist: (Optional) Handle overflow users.

Non-Functional Requirements

  1. Strict Consistency: Inventory count must be perfect. No eventual consistency here.
  2. High Availability: The “Check Balance” service must not crash under the 50k TPS load.
  3. Latency: Sub-second response to give users a “Fair” experience.

3. Capacity Estimation

  • Total Stock: 10,000.
  • Peak Load: 50,000 Req / Second.
  • Bottleneck: Most likely the Balance Service (SQL database with row-level locks).

4. High-Level Design: The Buffer Pattern

We cannot let 50,000 users hit the Balance DB at once. We need a Buffer (Queue) and a Reservation system.

flowchart TD
  U[User] -->|POST /buy| API[Purchase API]
  API -->|1. Check Stock| INV[(Redis Inventory)]
  INV -->|2. Reserve| INV
  API -->|3. Queue Req| Q[Purchase Queue]
  Q --> WORKER[Settlement Worker]
  WORKER -->|4. Debit| BAL[Balance Service]
  WORKER -->|5. Confirm| INV

5. Detailed Design: Reservation-Hold-Capture

To prevent the “I paid but didn’t get it” disaster, we use a three-phase approach.

Phase 1: The Redis Reservation (The “Fast” Check)

We store the inventory in Redis because it can handle 100k+ operations per second.

  • Command: DECRBY shares:1 1
  • Logic: If the result is < 0, the sale is over. Immediately tell the user “Sold Out”.
  • Benefit: This sheds 80% of the load before it hits our expensive SQL databases.

Phase 2: The Balance Hold

For the 10,000 users who got a Redis reservation, we now need to check their bank balance. We don’t deduct yet. We HOLD.

  • Action: Create a “HOLD” entry in the ledger. This money is now “Unavailable” to the user for other transactions but hasn’t left their account.

Phase 3: The Capture

Once the balance is successfully held, we finalize the share purchase.

  • If Success: Change HOLD to DEBIT and increment shares_sold in the SQL DB.
  • If Failure (e.g., user had insufficient funds): Release the Redis reservation (INCRBY shares:1 1) so someone else can buy it.

6. Deep Dive: Protecting the Balance Service

In a Wise interview, the interviewer will ask: “Your Balance service is slow. How do you protect it?”

  1. Queue Partitioning: Partition the Purchase Queue by user_id. This ensures that one user’s retries don’t block other users’ first attempts.
  2. Token Bucket: Use a rate limiter to ensure the Settlement Worker only sends 500 requests per second to the Balance Service, even if the queue has 10,000 items.
  3. Local Locking: Use SELECT ... FOR UPDATE in Postgres to ensure one user can’t perform two simultaneous purchases that exceed their balance.

7. Reliability: What if Redis Crashes?

Redis is “In-Memory”. If it restarts, we lose our “Reserved” count.

  • Solution: Redis AOF (Append Only File) with fsync everysec.
  • Recovery: On boot, the Inventory Service should query the SQL DB for COUNT(successful_purchases) and COUNT(active_holds) to recalibrate the Redis counter.

8. Summary: The Senior Interview Checklist

  1. Distributed Invariant: “How do you guarantee exactly 10,000?” (Explain the atomic Redis DECR + SQL reconciliation).
  2. User Fairness: “How do you handle users who have a slow internet connection?” (Talk about the “Queue” and how it acts as a buffer).
  3. Race Conditions: “What if two workers try to fulfill the last share?” (Use a UNIQUE constraint on shares.share_number in the DB).
  4. Load Shedding: Discuss returning 429 Too Many Requests at the API Gateway for the 50,001st user.