ACID Transactions: The Safety Net
In 2012, Knight Capital Group lost $440 million in 45 minutes. A software deployment reactivated an old trading algorithm that fired uncontrolled buy orders. The system lacked one critical property: Atomicity. Individual trades committed but there was no mechanism to treat the entire malformed session as a single rollback-able unit. By the time engineers hit the “stop” button, the damage was done. This disaster is why every senior engineer should be able to explain ACID from memory — it’s not academic theory. It’s how financial systems, medical records, and e-commerce checkouts survive failures.
[!IMPORTANT] In this lesson, you will master:
- The Physics of Durability: Why
fsync()and Disk Controller caches are the last line of defense.- Atomicity Internals: How Undo Logs (Shadow Paging) prevent “Half-Baked” data states.
- Distributed Consistency: Why the “Two-Phase Commit” (2PC) is an Availability killer and how Sagas solve it.
1. The Classic Problem: The Bank Transfer
Alice has 100. Bob has 50. Alice sends $20 to Bob. The database must do two things:
- Subtract $20 from Alice.
- Add $20 to Bob.
Disaster Scenario: The power goes out after Step 1.
- Alice has $80.
- Bob still has $50.
- $20 has vanished into thin air.
ACID prevents this.
2. The Four Pillars
A - Atomicity (“All or Nothing”)
This guarantees that the transaction is treated as a single “atom”. It cannot be split.
- If all steps succeed → Commit.
- If even one step fails → Rollback (Undo everything).
- Mechanism: Undo Logs. Before changing data, the DB writes the old value to an Undo Log.
- Deep Dive: Shadow Paging: Some databases (like LMDB) use Shadow Paging. Instead of overwriting data and logging the old value, they copy the entire page, modify the copy, and then atomically switch the pointer to the new page. No undo log needed, but it causes high disk fragmentation.
Staff Engineer Tip: Distinguish between Undo and Redo logs.
- Undo Log: Used for Atomicity. It stores “How to go back”.
- Redo Log (WAL): Used for Durability. It stores “How to go forward” if we crash after a commit but before the disk was updated.
C - Consistency (“Rules are Rules”)
The database must move from one valid state to another valid state.
- It must obey all defined rules: Foreign Keys, Unique Constraints, Check Constraints.
- Example: If you try to insert a row with a duplicate unique ID, the transaction aborts.
I - Isolation (“Private Workspace”)
Transactions running at the same time shouldn’t mess with each other.
- The Problem: If Alice and Bob both try to edit the same row simultaneously, one must wait (or fail).
- The Solution (MVCC): Modern databases (Postgres, MySQL) use Multi-Version Concurrency Control.
- Concept: When you update a row, the DB copies it.
- Reader: Reads Version 1.
- Result: Readers don’t block Writers!
Staff Engineer Tip: The Phantom Read. Isolation isn’t binary. In Repeatable Read isolation, you might still see Phantoms — where a SELECT COUNT(*) returns different values in the same transaction because someone else inserted new rows. Only Serializable isolation guarantees a total ordering.
D - Durability (“Written in Stone”)
- Once a transaction says “Success” (Commit), the data is permanent.
- Even if the server catches fire 1 millisecond later, the data is safe.
- Mechanism: Write-Ahead Logging (WAL). The DB writes the change to a sequential log file on the disk before updating the actual table.
[!NOTE] Hardware-First Intuition: Computers are full of liars. When the OS says “I wrote the file”, it often just means it put it in the RAM Buffer Cache. If power fails, that data is gone. To achieve true Durability, the database must issue an
fsync()system call. This forces the OS and the physical Disk Controller to flush their internal caches to the actual magnetic or flash storage. This is the slowest part of a transaction because it waits for physical hardware.
3. Interactive Demo: The Atomic Swap
Be the Database Engine.
- Start Transaction: Creates a safety checkpoint.
- Update Data: Watch the Undo Log record the backup.
- Crash/Rollback: Watch the engine use the Undo Log to restore the data.
[!TIP] Try it yourself: Start a transaction, debit money, and then hit “Crash/Rollback” to see how the Undo Log restores the original balance.
4. Deep Dive: Distributed ACID (The Nightmare)
ACID is easy on a single machine. It is a nightmare on distributed systems (Microservices). If Service A (Payment) and Service B (Shipping) both need to update, how do we guarantee Atomicity?
Option A: Two-Phase Commit (2PC)
The “Strict Boss” approach. Imagine a wedding ceremony.
- Prepare Phase: The Priest (Coordinator) asks “Do you take this…?” to both parties.
- Service A locks its database rows. “I’m ready.”
- Service B locks its database rows. “I’m ready.”
- Commit Phase: If both say “Yes”, the Priest says “I now pronounce you…”. Both commit simultaneously.
- The Fatal Flaw: It is a Blocking Protocol.
- If the Coordinator crashes after Step 1, Service A and Service B are stuck holding locks forever. No one else can touch that data.
- This is why 2PC is often called the “Availability Killer”.
Option B: Sagas (The Real World)
The “Undo Button” approach. Instead of locking everything, we execute a sequence of local transactions.
- Step 1: Charge Card (Service A). Success.
- Step 2: Ship Item (Service B). Fail (Out of stock).
- Compensating Action: Trigger a “Refund” transaction on Service A to undo Step 1.
- Pros: High performance. No global locks.
- Cons: Temporary inconsistency (User sees “Charged” then “Refunded”).
Option C: TCC (Try-Confirm-Cancel)
The “Reservation” approach.
- Try: Service A and B “reserve” resources (e.g., deduct from
available_balanceand add toreserved_balance). - Confirm: If both succeed, move from
reservedtofinal. - Cancel: If one fails, release the
reservedamounts.- Use Case: Better for high-value transactions where “Refunds” are legally or financially complex.
Choreography vs Orchestration
- Choreography: Service A emits an event “OrderPlaced”. Service B listens to it. Simple but hard to track.
- Orchestration: A central “Order Service” calls A, then B. Easier to manage.
[!TIP] Try it yourself: Click “Simulate Failure” to watch the Saga pattern execute a Compensating Transaction (Refund) when shipping fails.
[!TIP] Interview Pro-Tip: In Microservices, avoid 2PC like the plague. It creates a Single Point of Failure (Coordinator) and kills availability. Use Sagas or Eventual Consistency instead.
5. Summary
- Atomicity: Undo Logs save you from partial updates.
- Consistency: Constraints prevent illegal data.
- Isolation: MVCC allows readers and writers to work simultaneously.
- Durability: WAL ensures data survives power loss.
- Distributed: ACID is hard. Prefer Sagas (Eventual Consistency) over 2PC for microservices.
Mnemonic — “ACID Test”: Think of ACID as a bank vault:
- Atomicity = All or nothing (the vault door either opens fully or stays shut)
- Consistency = Rules enforced (vault won’t let you store an illegal item)
- Isolation = Private booth (each transaction has its own workspace)
- Durability = Written in stone (even if the building burns, the vault survives)
Staff Engineer Tip: 2PC Kills Availability — Know When to Use Sagas. In a microservices design review, if someone proposes 2PC (Two-Phase Commit), escalate immediately. A single Coordinator crash leaves all participants locked indefinitely — every row they touched becomes inaccessible. The correct pattern for cross-service atomicity is Sagas with compensating transactions. The tradeoff: users may briefly see an intermediate state (“charged but not yet shipped”), which is almost always acceptable. Design your UX to handle these “pending” states gracefully rather than using 2PC.