Distributed Locking: The Race

In the monolithic age, a simple Mutex (synchronized) kept order. But in the distributed future, with 50 microservices fighting for the same resource, memory locks are useless.

You need a Distributed Lock. A traffic light for your cluster.

1. The Anomaly: Double Booking

Imagine a Ticketmaster clone.

  1. User A checks: “Seat 1A Available?” → YES.
  2. User B checks: “Seat 1A Available?” → YES.
  3. User A books Seat 1A.
  4. User B books Seat 1A. Result: Collision. Data Corruption. Angry Users.

The Solution: Mutual Exclusion.

  1. User A Acquires Lock for seat_1A.
  2. User B tries to acquire Lock → FAILS (Wait).
  3. User A books seat → Releases Lock.
  4. User B acquires Lock → Checks Seat → “Sold Out”.

2. Efficiency vs Correctness

Before you implement a lock, you must ask: “What happens if the lock fails?”

Goal Description Consequence of Failure Solution
Efficiency Prevent doing the same work twice (e.g., sending email). Minor annoyance (User gets 2 emails). Redis (Redlock)
Correctness Prevent data corruption (e.g., money transfer). Catastrophic (Money lost). Fencing Tokens (ZooKeeper/Etcd)

[!WARNING] Redis is for Efficiency. If you need absolute safety (Correctness), do not rely solely on Redis. Use a consensus system like ZooKeeper or Etcd because Redis (even Redlock) makes assumptions about system clocks.


3. The Tool: Redis SETNX

The simplest distributed lock is a single atomic command in Redis.

  • Command: SET resource_name my_random_value NX PX 30000
    • NX: Not Exists (Only set if key doesn’t exist).
    • PX 30000: Pexpire (Auto-delete after 30s).

Why the TTL (Time To Live)?

If the client holding the lock crashes before releasing it, a lock without a TTL stays forever (Deadlock). The TTL ensures the lock auto-releases, acting as a Lease.


4. The Trap: The Ghost Writer (GC Pauses)

Here is how a simple Redis lock fails during a Garbage Collection (GC) Pause.

  1. Client A acquires Lock (TTL 5s).
  2. Client A freezes for 8s (GC Pause). Lock Expired.
  3. Client B acquires Lock. Writes to DB.
  4. Client A wakes up. Thinks it still holds the lock. Writes to DB. Result: Last Write Wins. Client A overwrites Client B’s valid data.

Sequence Diagram: The Ghost Writer

Client A
Lock Service
Database
1. Acquire Lock (TTL 5s)
GC Pause (8s)
Lock Expires! 🔓
2. Client B Acquires Lock
(Wakes Up)
3. Write Data (UNSAFE!)

The Fix: Fencing Tokens

To solve this, we need the Storage Layer to help.

  1. Lock Service returns a monotonic Token (1, 2, 3…).
  2. Client A gets Token 33.
  3. Client B gets Token 34 (after A expires).
  4. Client A wakes up, tries to write with 33.
  5. Database checks: “I’ve already seen 34. Reject 33.”

5. Interactive Demo: Redlock & Time Travel

Cyberpunk Mode: Simulate the Race Condition.

  • Mission: Acquire the lock and write to the Database.
  • Weapon: “Freeze Ray” (Simulates GC Pause).
  • Defense: Fencing Tokens (Visualized).

[!TIP] Try it yourself:

  1. Acquire Lock as Client A.
  2. Immediately hit “❄️ Freeze (GC)”. This pauses Client A for 6 seconds (longer than the 5s Lock TTL).
  3. Wait for the lock to expire (watch the red bar).
  4. Acquire Lock as Client B. Client B will write to the DB (Token 34).
  5. Watch Client A wake up and try to write with Token 33.
  6. Result: The Database triggers a “BLOCKED” shield because 33 < 34.
REDIS LOCK
UNLOCKED
Next Token: 33
DATABASE
Max Token Seen
32
CLIENT A
Idle
Token: -
CLIENT B
Idle
Token: -
System Ready. Lock TTL is 5 seconds.

6. Redlock Algorithm (Multi-Master)

Single Redis is a Single Point of Failure. Redlock uses 5 independent Redis masters to solve this.

  1. Client gets current timestamp.
  2. Tries to acquire lock in all 5 instances sequentially.
  3. If acquired in Majority (3/5) and time elapsed < TTL:
    • Lock Acquired.
  4. Else:
    • Unlock All.

The Controversy: Kleppmann vs Antirez

Distributed Systems researcher Martin Kleppmann famously critiqued Redlock.

  • The Issue: Redlock relies on Wall-Clock Time. If a server’s clock jumps forward (e.g., NTP sync), it might expire a lock prematurely.
  • The Verdict:
    • Use Redlock for Efficiency (preventing double-processing).
    • Use ZooKeeper/Etcd for Correctness (preventing data corruption). ZooKeeper uses logical clocks (Zxid), not wall clocks.

Summary

  • Distributed Locks are essential for Mutual Exclusion.
  • TTL (Lease) prevents deadlocks but introduces race conditions.
  • Fencing Tokens are the shield against Zombie Leaders (GC Pauses).
  • Redlock is great for efficiency, but not for financial safety.