Design Payment System Architecture
1. What is a Payment System?
A Payment System moves money from one entity to another. It sits between the User (Merchant) and the Banking Infrastructure (VISA/MasterCard/Banks).
[!TIP] The Golden Rule: Never Lose Money. Unlike a Chat App where losing a message is “annoying”, losing a payment record is “illegal” or “business-ending”. We prioritize Consistency over everything (ACID).
Real-World Examples
- Stripe/PayPal: Payment Gateways.
- Wise (TransferWise): Cross-border remittances.
- Internal Wallets: Uber Credits, Amazon Gift Balance.
2. Requirements & Goals
Functional Requirements
- Money Transfer: Move funds from Account A to Account B.
- Payment Gateway Integration: Accept credit card payments via PSPs (Payment Service Providers).
- Transaction History: View immutable audit logs.
- Reconciliation: Ensure our numbers match the Bank’s numbers.
Non-Functional Requirements
- Data Integrity (ACID): Transactions must be atomic.
- Reliability: 99.999% availability.
- Security: Compliance with PCI-DSS (Encryption at rest/transit).
- Idempotency: Processing the same request twice must not result in double charging.
3. Capacity Estimation
- Transactions: 1 Million / day $\approx$ 12 TPS. (Peak 100 TPS).
- Throughput: Low. Payment systems are rarely “High Throughput” like Twitter. They are “High Value”.
- Storage: Immutability means data grows forever. 1M rows/day.
4. System APIs
Execute Payment
We rely heavily on Idempotency Keys.
POST /v1/payments
Idempotency-Key: "uuid-v4-generated-by-client"
{
"from_user": "u_alice",
"to_user": "u_bob",
"amount": 5000, // Cents (Avoid Floating Point!)
"currency": "USD"
}
5. Database Design & The Ledger
NEVER use UPDATE users SET balance = balance - 50. This destroys history and makes debugging impossible.
Use a Double-Entry Ledger.
Table: ledger_entries
Every transaction creates at least two rows.
| id | transaction_id | account_id | amount | type | created_at |
|---|---|---|---|---|---|
| 1 | txn_101 | Alice | -5000 | DEBIT | 10:00:01 |
| 2 | txn_101 | Bob | +5000 | CREDIT | 10:00:01 |
| 3 | txn_102 | Bob | -200 | DEBIT | 11:00:00 |
| 4 | txn_102 | Fees | +200 | CREDIT | 11:00:00 |
The Zero-Sum Invariant
The sum of amount for any transaction_id must always be 0. This allows you to audit the entire system by summing every row in the database.
-- Verification Query
SELECT transaction_id, SUM(amount)
FROM ledger_entries
GROUP BY transaction_id
HAVING SUM(amount) != 0;
-- Result should be EMPTY. If not, ALARM!
Why SQL?
We need ACID Transactions (See Module 04: Database Basics).
- PostgreSQL / MySQL (InnoDB) are standard.
- NoSQL (Cassandra/Mongo) generally lacks multi-row ACID transactions (though some support exists now, SQL is safer for financial data).
6. High-Level Design
High-Level Architecture: End-to-End Payment Execution & Reconciliation.
---|----|---
01 | A | -50
02 | B | +50
Sum = 0
The system follows an At-Least-Once with Idempotency pattern to ensure every payment is processed exactly once:
- Payment Service (Orchestrator): The entry point for all requests. It handles high-level logic, including Risk Engine checks for fraud and velocity.
- Idempotency Store (Redis): Before any action, the system checks this store using the
Idempotency-Keyto prevent duplicate processing of the same request. - Ledger DB (SQL): The decentralized source of truth. It records every fund movement using Double-Entry Bookkeeping (Debit/Credit pairs) to maintain a zero-sum invariant.
- PSP Connector: A specialized adapter that translates internal requests into third-party API calls (e.g., Stripe, PayPal, Adyen).
- Reconciliation Cron: An offline worker that performs a Three-Way Match between the internal Ledger, PSP reports, and Bank statements.
7. Deep Dive: Idempotency & Double Spending
Part A: Idempotency (The “Exactly-Once” Illusion)
What if the Client sends a request, the Server charges the card, but the Response is lost due to a network timeout? The Client will Retry.
Without Idempotency, you get a Double Charge. This is prevented by the Idempotency Store (Redis) shown in our high-level architecture.
The Idempotency Key Pattern:
- Client generates UUID
key123. - Client sends
POST /paywithIdempotency-Key: key123. - Server checks DB/Redis: “Have I processed
key123?”- Yes: Return the original stored response (Success/Fail). Do NOT process again.
- No: Insert
key123with statusPENDING. Process payment. Update toSUCCESS.
Interactive: Idempotency Simulator
Send requests with the same Key to see how the server handles duplicates.
[!TIP] Try it yourself: Click “Pay $10” repeatedly with the same key. Watch how the server rejects duplicates. Then check “New Key” to make a fresh payment.
Part B: Preventing Double Spending (Race Conditions)
Scenario: Alice has $100. She initiates two transfers of $80 concurrently (Total $160).
- Thread A reads Bal($100). Checks $100 >= $80. OK.
- Thread B reads Bal($100). Checks $100 >= $80. OK.
- Thread A updates Bal($20).
- Thread B updates Bal($20).
- Result: Alice spent $160 but only lost $80. The system lost money.
Solution: Pessimistic Locking
We must serialize access to Alice’s account using SELECT ... FOR UPDATE (See Pessimistic Locking).
BEGIN;
-- Lock the row. Other transactions must wait.
SELECT balance FROM accounts WHERE id = 'Alice' FOR UPDATE;
-- Perform checks
IF balance >= 80 THEN
INSERT INTO ledger ...;
UPDATE accounts SET balance = balance - 80 ...;
END IF;
COMMIT;
-- Lock released
8. Data Partitioning & Sharding
As the ledger grows, we need to shard.
Strategy: Shard by account_id
- Logic: All transactions for “Alice” live on Shard 1. All for “Bob” live on Shard 2.
- Problem: Transfers between Alice (Shard 1) and Bob (Shard 2) require a Distributed Transaction (2PC or Saga).
- Optimization: Since most transactions are intra-region or small, we try to keep related accounts together. But eventually, we need 2PC (See Module 09: Distributed Transactions).
- Alternative: Use a NewSQL database (CockroachDB/Spanner) that handles distributed transactions automatically.
9. Reliability (Reconciliation)
Software has bugs. Networks fail. Cosmic rays flip bits. You generally rely on Reconciliation as the ultimate safety net—the Nightly Match Path in our diagram.
The Three-Way Match (Nightly Cron)
We compare three sources of truth:
- Internal Ledger: What we think happened.
- Payment Gateway (Stripe) Reports: What the processor thinks happened.
- Bank Statements: Where the money actually went.
Process:
- Fetch External Data: Download “Settlement Report” from Stripe/Bank (CSV/API).
- Match: Join on
payment_provider_id. - Find Discrepancies:
- Stripe says Success, We say Failed: We owe the user service/goods. Fix state to Success.
- Stripe says Failed, We say Success: We gave free service. Panic. Alert Finance Team to reverse.
10. Interactive Decision Visualizer: Race Conditions
This demo simulates the “Double Spend” problem. Two threads (A and B) try to withdraw $80 from a $100 balance simultaneously.
- Unsafe Mode: Threads interleave their steps (Read-Read-Write-Write).
- Safe Mode: A Mutex Lock forces Thread B to wait until Thread A finishes.
[!TIP] Try it yourself: Click “Step Execution” to see how race conditions occur. Then enable “Enable DB Lock” and try again to see the fix.
11. System Walkthrough: The Life of a Payment
Let’s trace the exact steps when Alice pays Bob $50.
Step 1: Request Initiation
- Client (Alice) generates
Idempotency-Key: uuid-v4. - POST /payments:
{ "key": "abc-123", "from": "Alice", "to": "Bob", "amount": 5000 }
Step 2: Idempotency Check (Redis)
- Server checks Redis:
SETNX idempotency:abc-123 "PENDING". - If 0 (False): Key exists. Return cached response.
- If 1 (True): Proceed.
Step 3: ACID Transaction (Postgres)
We open a transaction to ensure atomicity.
BEGIN;
-- 1. Lock Alice's Account (Prevent Double Spend)
SELECT balance FROM accounts WHERE id = 'Alice' FOR UPDATE;
-- 2. Validate
IF balance < 5000 THEN ROLLBACK; RETURN Error;
-- 3. Insert Ledger Entries (Immutable History)
INSERT INTO ledger (tx_id, acc_id, amount) VALUES ('tx_99', 'Alice', -5000);
INSERT INTO ledger (tx_id, acc_id, amount) VALUES ('tx_99', 'Bob', +5000);
-- 4. Update Balances (Cached View)
UPDATE accounts SET balance = balance - 5000 WHERE id = 'Alice';
UPDATE accounts SET balance = balance + 5000 WHERE id = 'Bob';
COMMIT;
Step 4: Finalize
- Update Redis Key:
SET idempotency:abc-123 "SUCCESS". - Return
200 OK.
12. Requirements Traceability Matrix
| Requirement | Architectural Solution |
|---|---|
| Consistency (ACID) | PostgreSQL with SERIALIZABLE or FOR UPDATE locking. |
| Exactly-Once | Idempotency Keys stored in Redis/DB. |
| Auditability | Double-Entry Ledger (Immutable Append-Only Log). |
| Availability | Replication (Primary-Replica) for DB. Redis Cluster for locks. |
| Reconciliation | Nightly Batch Jobs comparing Internal vs External state. |
13. Follow-Up Questions: The Interview Gauntlet
I. Concurrency & Locking
- Optimistic vs Pessimistic Locking? Payment systems use Pessimistic (
FOR UPDATE) because conflicts are high-risk (money loss). Retrying optimistic failures in a financial context is risky and complex. - Deadlocks? Always acquire locks in a consistent order (e.g., sort by
account_id). If Alice pays Bob, and Bob pays Alice, sorting ensures both transactions try to lockAlicethenBob(or vice versa), preventing cycles.
II. Distributed Systems
- How to handle Cross-Shard Transfers? Two-Phase Commit (2PC) or Saga Pattern. 2PC is safer for consistency but slower. Sagas require “Compensating Transactions” (Undo logic).
- What if the DB commits but Redis fails? The Idempotency Key might remain “PENDING”. The next retry will see “PENDING” and can check the DB to see if
tx_idexists. If yes, return Success. This is “Repair on Read”.
III. Reliability
- Reconciliation lag? Nightly reconciliation means we might not catch errors for 24h. For critical systems, use Real-Time Stream Reconciliation (Kafka kSQL) to match events as they happen.
14. Summary: The Whiteboard Strategy
If asked to design Stripe/PayPal, draw this 4-Quadrant Layout:
1. Requirements
- Func: Move Money, History.
- Non-Func: ACID, Exactly-Once.
- Scale: 100 TPS (Low), 100% Accuracy.
2. Architecture
↓
[Payment Svc]
↓
[Ledger DB (ACID)] <--> [PSP (Stripe)]
* Double Entry: Sum(Dr) == Sum(Cr).
* Locking: Pessimistic.
3. Data & API
Ledger: (tx_id, acc, amt)
Invariant: SUM(amt) GROUP BY tx_id == 0
4. Safety Mechanisms
- Idempotency: Prevents retry duplicates.
- Row Lock: Prevents race conditions.
- Reconciliation: Catches silent failures.