Design Payment System Architecture

1. What is a Payment System?

A Payment System moves money from one entity to another. It sits between the User (Merchant) and the Banking Infrastructure (VISA/MasterCard/Banks).

[!TIP] The Golden Rule: Never Lose Money. Unlike a Chat App where losing a message is “annoying”, losing a payment record is “illegal” or “business-ending”. We prioritize Consistency over everything (ACID).

Real-World Examples

  • Stripe/PayPal: Payment Gateways.
  • Wise (TransferWise): Cross-border remittances.
  • Internal Wallets: Uber Credits, Amazon Gift Balance.

2. Requirements & Goals

Functional Requirements

  1. Money Transfer: Move funds from Account A to Account B.
  2. Payment Gateway Integration: Accept credit card payments via PSPs (Payment Service Providers).
  3. Transaction History: View immutable audit logs.
  4. Reconciliation: Ensure our numbers match the Bank’s numbers.

Non-Functional Requirements

  1. Data Integrity (ACID): Transactions must be atomic.
  2. Reliability: 99.999% availability.
  3. Security: Compliance with PCI-DSS (Encryption at rest/transit).
  4. Idempotency: Processing the same request twice must not result in double charging.

3. Capacity Estimation

  • Transactions: 1 Million / day $\approx$ 12 TPS. (Peak 100 TPS).
  • Throughput: Low. Payment systems are rarely “High Throughput” like Twitter. They are “High Value”.
  • Storage: Immutability means data grows forever. 1M rows/day.

4. System APIs

Execute Payment

We rely heavily on Idempotency Keys.

POST /v1/payments
Idempotency-Key: "uuid-v4-generated-by-client"
{
  "from_user": "u_alice",
  "to_user": "u_bob",
  "amount": 5000, // Cents (Avoid Floating Point!)
  "currency": "USD"
}

5. Database Design & The Ledger

NEVER use UPDATE users SET balance = balance - 50. This destroys history and makes debugging impossible. Use a Double-Entry Ledger.

Table: ledger_entries

Every transaction creates at least two rows.

id transaction_id account_id amount type created_at
1 txn_101 Alice -5000 DEBIT 10:00:01
2 txn_101 Bob +5000 CREDIT 10:00:01
3 txn_102 Bob -200 DEBIT 11:00:00
4 txn_102 Fees +200 CREDIT 11:00:00

The Zero-Sum Invariant

The sum of amount for any transaction_id must always be 0. This allows you to audit the entire system by summing every row in the database.

-- Verification Query
SELECT transaction_id, SUM(amount)
FROM ledger_entries
GROUP BY transaction_id
HAVING SUM(amount) != 0;
-- Result should be EMPTY. If not, ALARM!

Why SQL?

We need ACID Transactions (See Module 04: Database Basics).

  • PostgreSQL / MySQL (InnoDB) are standard.
  • NoSQL (Cassandra/Mongo) generally lacks multi-row ACID transactions (though some support exists now, SQL is safer for financial data).

6. High-Level Design

High-Level Architecture: End-to-End Payment Execution & Reconciliation.

System Architecture: Payment & Ledger System
Double-Entry Ledger | Exactly-Once Processing | Nightly Reconciliation
Payment Flow
Ledger (Source of Truth)
Reconciliation Path
User Layer
Orchestration Layer
State Layer
External / Audit Layer
🛍️
User / Merchant
Payment Svc
Orchestrator
Risk Engine
Redis
Idempotency Keys
PSP Connector
Gateway API
(Stripe/PayPal)
Ledger DB
ID | DR | CR
---|----|---
01 | A | -50
02 | B | +50
Double-Entry Invariant:
Sum = 0
🏦
External PSP
(Stripe/Adyen)
Reconciliation
Nightly Batch
Settle Reports
POST /pay {IdemKey}
1. SETNX Key
2. Double-Entry
3. Execute Fund Move
4. Three-Way
Match

The system follows an At-Least-Once with Idempotency pattern to ensure every payment is processed exactly once:

  1. Payment Service (Orchestrator): The entry point for all requests. It handles high-level logic, including Risk Engine checks for fraud and velocity.
  2. Idempotency Store (Redis): Before any action, the system checks this store using the Idempotency-Key to prevent duplicate processing of the same request.
  3. Ledger DB (SQL): The decentralized source of truth. It records every fund movement using Double-Entry Bookkeeping (Debit/Credit pairs) to maintain a zero-sum invariant.
  4. PSP Connector: A specialized adapter that translates internal requests into third-party API calls (e.g., Stripe, PayPal, Adyen).
  5. Reconciliation Cron: An offline worker that performs a Three-Way Match between the internal Ledger, PSP reports, and Bank statements.

7. Deep Dive: Idempotency & Double Spending

Part A: Idempotency (The “Exactly-Once” Illusion)

What if the Client sends a request, the Server charges the card, but the Response is lost due to a network timeout? The Client will Retry.

Without Idempotency, you get a Double Charge. This is prevented by the Idempotency Store (Redis) shown in our high-level architecture.

The Idempotency Key Pattern:

  1. Client generates UUID key123.
  2. Client sends POST /pay with Idempotency-Key: key123.
  3. Server checks DB/Redis: “Have I processed key123?”
    • Yes: Return the original stored response (Success/Fail). Do NOT process again.
    • No: Insert key123 with status PENDING. Process payment. Update to SUCCESS.

Interactive: Idempotency Simulator

Send requests with the same Key to see how the server handles duplicates.

[!TIP] Try it yourself: Click “Pay $10” repeatedly with the same key. Watch how the server rejects duplicates. Then check “New Key” to make a fresh payment.

Client
Server (DB)
🗄️
🛡️
> System Ready.

Part B: Preventing Double Spending (Race Conditions)

Scenario: Alice has $100. She initiates two transfers of $80 concurrently (Total $160).

  • Thread A reads Bal($100). Checks $100 >= $80. OK.
  • Thread B reads Bal($100). Checks $100 >= $80. OK.
  • Thread A updates Bal($20).
  • Thread B updates Bal($20).
  • Result: Alice spent $160 but only lost $80. The system lost money.

Solution: Pessimistic Locking

We must serialize access to Alice’s account using SELECT ... FOR UPDATE (See Pessimistic Locking).

BEGIN;
-- Lock the row. Other transactions must wait.
SELECT balance FROM accounts WHERE id = 'Alice' FOR UPDATE;
-- Perform checks
IF balance >= 80 THEN
    INSERT INTO ledger ...;
    UPDATE accounts SET balance = balance - 80 ...;
END IF;
COMMIT;
-- Lock released

8. Data Partitioning & Sharding

As the ledger grows, we need to shard.

Strategy: Shard by account_id

  • Logic: All transactions for “Alice” live on Shard 1. All for “Bob” live on Shard 2.
  • Problem: Transfers between Alice (Shard 1) and Bob (Shard 2) require a Distributed Transaction (2PC or Saga).
  • Optimization: Since most transactions are intra-region or small, we try to keep related accounts together. But eventually, we need 2PC (See Module 09: Distributed Transactions).
  • Alternative: Use a NewSQL database (CockroachDB/Spanner) that handles distributed transactions automatically.

9. Reliability (Reconciliation)

Software has bugs. Networks fail. Cosmic rays flip bits. You generally rely on Reconciliation as the ultimate safety net—the Nightly Match Path in our diagram.

The Three-Way Match (Nightly Cron)

We compare three sources of truth:

  1. Internal Ledger: What we think happened.
  2. Payment Gateway (Stripe) Reports: What the processor thinks happened.
  3. Bank Statements: Where the money actually went.

Process:

  1. Fetch External Data: Download “Settlement Report” from Stripe/Bank (CSV/API).
  2. Match: Join on payment_provider_id.
  3. Find Discrepancies:
    • Stripe says Success, We say Failed: We owe the user service/goods. Fix state to Success.
    • Stripe says Failed, We say Success: We gave free service. Panic. Alert Finance Team to reverse.

10. Interactive Decision Visualizer: Race Conditions

This demo simulates the “Double Spend” problem. Two threads (A and B) try to withdraw $80 from a $100 balance simultaneously.

  • Unsafe Mode: Threads interleave their steps (Read-Read-Write-Write).
  • Safe Mode: A Mutex Lock forces Thread B to wait until Thread A finishes.

[!TIP] Try it yourself: Click “Step Execution” to see how race conditions occur. Then enable “Enable DB Lock” and try again to see the fix.

Global Account Balance
$100
Thread A (User Request 1)
Waiting...
1. READ Balance
2. CHECK (Bal >= 80)
3. WRITE (Bal - 80)
Local Var: -
Thread B (User Request 2)
Waiting...
1. READ Balance
2. CHECK (Bal >= 80)
3. WRITE (Bal - 80)
Local Var: -
🔒
LOCKED
Waiting for Thread A...

11. System Walkthrough: The Life of a Payment

Let’s trace the exact steps when Alice pays Bob $50.

Step 1: Request Initiation

  • Client (Alice) generates Idempotency-Key: uuid-v4.
  • POST /payments:
    { "key": "abc-123", "from": "Alice", "to": "Bob", "amount": 5000 }
    

Step 2: Idempotency Check (Redis)

  • Server checks Redis: SETNX idempotency:abc-123 "PENDING".
  • If 0 (False): Key exists. Return cached response.
  • If 1 (True): Proceed.

Step 3: ACID Transaction (Postgres)

We open a transaction to ensure atomicity.

BEGIN;

-- 1. Lock Alice's Account (Prevent Double Spend)
SELECT balance FROM accounts WHERE id = 'Alice' FOR UPDATE;

-- 2. Validate
IF balance < 5000 THEN ROLLBACK; RETURN Error;

-- 3. Insert Ledger Entries (Immutable History)
INSERT INTO ledger (tx_id, acc_id, amount) VALUES ('tx_99', 'Alice', -5000);
INSERT INTO ledger (tx_id, acc_id, amount) VALUES ('tx_99', 'Bob', +5000);

-- 4. Update Balances (Cached View)
UPDATE accounts SET balance = balance - 5000 WHERE id = 'Alice';
UPDATE accounts SET balance = balance + 5000 WHERE id = 'Bob';

COMMIT;

Step 4: Finalize

  • Update Redis Key: SET idempotency:abc-123 "SUCCESS".
  • Return 200 OK.

12. Requirements Traceability Matrix

Requirement Architectural Solution
Consistency (ACID) PostgreSQL with SERIALIZABLE or FOR UPDATE locking.
Exactly-Once Idempotency Keys stored in Redis/DB.
Auditability Double-Entry Ledger (Immutable Append-Only Log).
Availability Replication (Primary-Replica) for DB. Redis Cluster for locks.
Reconciliation Nightly Batch Jobs comparing Internal vs External state.

13. Follow-Up Questions: The Interview Gauntlet

I. Concurrency & Locking

  • Optimistic vs Pessimistic Locking? Payment systems use Pessimistic (FOR UPDATE) because conflicts are high-risk (money loss). Retrying optimistic failures in a financial context is risky and complex.
  • Deadlocks? Always acquire locks in a consistent order (e.g., sort by account_id). If Alice pays Bob, and Bob pays Alice, sorting ensures both transactions try to lock Alice then Bob (or vice versa), preventing cycles.

II. Distributed Systems

  • How to handle Cross-Shard Transfers? Two-Phase Commit (2PC) or Saga Pattern. 2PC is safer for consistency but slower. Sagas require “Compensating Transactions” (Undo logic).
  • What if the DB commits but Redis fails? The Idempotency Key might remain “PENDING”. The next retry will see “PENDING” and can check the DB to see if tx_id exists. If yes, return Success. This is “Repair on Read”.

III. Reliability

  • Reconciliation lag? Nightly reconciliation means we might not catch errors for 24h. For critical systems, use Real-Time Stream Reconciliation (Kafka kSQL) to match events as they happen.

14. Summary: The Whiteboard Strategy

If asked to design Stripe/PayPal, draw this 4-Quadrant Layout:

1. Requirements

  • Func: Move Money, History.
  • Non-Func: ACID, Exactly-Once.
  • Scale: 100 TPS (Low), 100% Accuracy.

2. Architecture

[Client] -> [API] -> [Idempotency Redis]

[Payment Svc]

[Ledger DB (ACID)] <--> [PSP (Stripe)]

* Double Entry: Sum(Dr) == Sum(Cr).
* Locking: Pessimistic.

3. Data & API

POST /pay {key, from, to, amt}
Ledger: (tx_id, acc, amt)
Invariant: SUM(amt) GROUP BY tx_id == 0

4. Safety Mechanisms

  • Idempotency: Prevents retry duplicates.
  • Row Lock: Prevents race conditions.
  • Reconciliation: Catches silent failures.

Return to Specialized Systems