MVCC Internals: The Physics of Concurrency
[!NOTE] This module explores the core principles of the subject, deriving solutions from first principles and hardware constraints to build world-class, production-ready expertise.
PostgreSQL handles concurrency using MVCC (Multi-Version Concurrency Control). Unlike traditional locking systems where a reader must block a writer (or vice versa) to ensure consistency, MVCC allows multiple versions of the same row to exist simultaneously in the database.
[!IMPORTANT] The Golden Rule of Postgres Concurrency: Readers never block writers, and writers never block readers.
This is achieved by treating data as immutable at the tuple (row) level. When you “update” a row, Postgres doesn’t overwrite the data in place on the disk. Instead, it marks the old row as “dead” and inserts a brand new version of the row.
1. The Tuple Header (The Metadata)
Every row in a Postgres table (a Tuple) carries a hidden header containing metadata used for visibility checks. This header is 23 bytes (usually padded to 24) and comes before your actual column data.
The most critical fields for MVCC are:
| Field | Size | Description |
|---|---|---|
| xmin | 4 bytes | The XID that created (inserted) this tuple version. |
| xmax | 4 bytes | The XID that deleted (or updated) this tuple version. If 0, the tuple is live. |
| cmin/cmax | 4 bytes | Command identifiers within a transaction (e.g., 1st statement, 2nd statement). |
| ctid | 6 bytes | The physical location of the next version of this tuple (Page ID, Offset). Used to chain updates. |
| infomask | 2 bytes | Bitmask of boolean flags (e.g., “Hint Bits” like XMIN_COMMITTED). |
Visualizing the Tuple Header
2. Interactive Visibility Simulator
How does Postgres decide if you (Transaction A) can see a specific row? It compares your Transaction ID (XID) and snapshot against the row’s xmin and xmax.
Experiment with this simulator to understand the visibility rules.
3. Visibility Rules: The Logic
When your transaction starts, Postgres takes a snapshot. This snapshot contains the Transaction ID of your transaction and a list of all transactions that were active (not committed yet) at that moment.
For a tuple to be visible to you:
- Creation Check (
xmin):- The transaction that created it (
xmin) must have committed. - The
xminmust be “in the past” relative to your snapshot.
- The transaction that created it (
- Deletion Check (
xmax):- The tuple must not be deleted (
xmax= 0). - OR, if it is deleted, the transaction that deleted it (
xmax) must not be visible to you (it committed after your snapshot started or is still running).
- The tuple must not be deleted (
The “Dirty” Secret of UPDATE
In Postgres, UPDATE is actually a syntactic sugar for DELETE + INSERT.
When you run UPDATE users SET name='Bob' WHERE id=1;:
- Postgres marks the old tuple (name=’Alice’) as dead by setting its
xmaxto the current transaction ID. - Postgres inserts a new tuple (name=’Bob’) with
xminset to the current transaction ID. - It updates the index to point to the new tuple.
This is why UPDATE heavy workloads can be expensive—they generate “garbage” (dead tuples) just like inserts and deletes.
4. VACUUM and Bloat
Since deleting a row only marks it as invisible (xmax), the data stays on disk. Over time, these dead tuples accumulate, causing Table Bloat.
The Cleanup Crew: VACUUM
The VACUUM process is responsible for reclaiming this space.
- Standard VACUUM: Scans the table for dead tuples (where
xmaxis visible to everyone). It marks that space as “free” in the Free Space Map (FSM). The file size does not shrink, but the space is reused for future inserts. - VACUUM FULL: Rewrites the entire table to a new file, removing all bloat. This reclaims disk space but locks the table exclusively, blocking all reads and writes.
- Autovacuum: A background daemon that wakes up periodically to run standard VACUUM on tables with significant activity.
[!WARNING] Never Disable Autovacuum Disabling autovacuum is a common production mistake. It leads to:
- Unbounded table growth (bloat).
- Index bloat (slower queries).
- Transaction ID Wraparound (database shutdown).
5. Transaction ID Wraparound (The Apocalypse)
xmin and xmax are 32-bit integers. This means there are only ~4 billion IDs available. If the counter overflows, old transactions might suddenly appear to be in the “future” (invisible), causing data loss.
To prevent this, Postgres uses a “frozen” XID concept and forces a special “Freeze” VACUUM to run on old data, replacing the xmin with a special FrozenTransactionId (2) to indicate “this is infinitely old and visible to everyone.”
6. Summary
- Readers don’t block writers: Because readers see an older version of the row.
- Immutability: Rows are never overwritten, only versioned.
- Space Cost: MVCC requires extra storage for
xmin/xmaxheaders and dead tuples. - Maintenance: Regular
VACUUMis mandatory to recycle space occupied by dead tuples.