Design Airbnb (Hotel Reservation System)

[!IMPORTANT] In this lesson, you will master:

  1. Process Requirements: Defining the core challenges of a reservation system.
  2. Data Model & Inventory Management: Structuring data to support high-concurrency bookings.
  3. Concurrency Control: Deep dive into Pessimistic vs. Optimistic Locking to prevent double-bookings.
  4. Distributed Transactions: Handling payment processing and inventory reservation safely using the Saga pattern or 2PC.
  5. System Architecture: Scaling the read-heavy search path separately from the write-heavy booking path.

1. The Core Challenge: The “Double Booking” Nightmare

Imagine you are managing reservations for a massive event like the Eras Tour or the World Cup. Two die-hard fans, Alice and Bob, are looking at the exact same beachfront apartment for the exact same dates. There is only one unit available.

They both click “Book Now” at precisely the exact same millisecond.

If your system isn’t designed correctly, both requests might see available_rooms = 1, both will decrement the count to 0, and both will receive a confirmation email. You now have two people showing up to one apartment. This is the Double Booking problem, and solving it under extreme concurrency is the defining challenge of any reservation system (Airbnb, Expedia, Ticketmaster).


2. Requirements & Goals (PEDALS: Process Requirements)

Before we architect the solution, let’s define the boundaries of our system.

Functional Requirements

  1. Search & Discovery: Users can search for properties based on location, dates, guest count, and apply filters (price, amenities).
  2. View Property Details: Users can view photos, descriptions, reviews, and real-time availability.
  3. Book Property: Users can reserve a property for specific dates.
  4. Payment Integration: Securely process payments to confirm the booking.

Non-Functional Requirements

  1. Strict Consistency (Crucial): Zero tolerance for double-bookings. The booking path must be strongly consistent.
  2. High Availability (Search): Users must be able to search and view properties even if the booking service is temporarily degraded.
  3. Low Latency: Search results must load in < 200ms.
  4. Idempotency: Retrying a booking or payment request must not result in duplicate charges or multiple reservations.

3. Back-of-the-Envelope Estimation (PEDALS: Estimate)

To understand our scaling constraints, let’s estimate the traffic.

  • Listings: 10 million active listings.
  • Daily Active Users (DAU): 50 million.
  • Read-to-Write Ratio: 10,000:1. Searching and browsing are incredibly read-heavy compared to actually booking.
  • Search QPS: 50M users * 10 searches/day = 500M searches/day &approx; 5,800 QPS (Peak &approx; 20,000 QPS).
  • Booking QPS: 500,000 bookings/day &approx; 6 QPS (Peak &approx; 50 QPS).

Observation: The system is fundamentally split. The Search Path requires massive scale, caching, and low latency (eventual consistency is fine). The Booking Path has low QPS but requires absolute, strict consistency.


4. Data Model (PEDALS: Data Model)

Choosing the right database is critical. Since we need strict ACID guarantees for reservations and financial transactions, a Relational Database (RDBMS) like PostgreSQL or MySQL is the industry standard for the Booking Path.

Key Tables

1. properties Table Stores static metadata.

  • property_id (PK)
  • host_id
  • title, description, location_coords

2. property_availability Table (The Inventory) This table is the source of truth for availability. Instead of storing a boolean is_available, we track availability by date to support multi-day bookings.

  • availability_id (PK)
  • property_id (FK)
  • date (Date)
  • status (Enum: AVAILABLE, RESERVED, BOOKED)
  • version (Int) - Crucial for Optimistic Locking (discussed later)

(Composite Unique Index on property_id + date)

3. reservations Table

  • reservation_id (PK)
  • user_id (FK)
  • property_id (FK)
  • start_date, end_date
  • status (Enum: PENDING, CONFIRMED, CANCELLED)
  • idempotency_key (String)

5. High-Level Architecture (PEDALS: Architecture)

We separate the read-heavy Search Service from the write-heavy Booking Service to scale them independently.

High-Level Architecture: Reservation System
Separating Search (Read) from Booking (Write) Paths
CDN / LB
Search Service
(Read Heavy)
Booking Service
(Write Heavy)
Elasticsearch
(Geospatial / Text)
Primary DB
(Postgres ACID)
CDC (Kafka)
Syncs data
Payment Gateway
(Stripe)

The Data Sync Pipeline: How does the Search Service know when a room is booked? We use Change Data Capture (CDC) (e.g., Debezium) attached to Postgres. When a booking confirms, the CDC reads the Postgres Write-Ahead Log (WAL), publishes an event to Kafka, and a consumer updates the Elasticsearch index to reflect the new availability. This provides Eventual Consistency for the search index.


6. Localized Details: Preventing Double Bookings (PEDALS: Localized Details)

This is the most critical technical concept in this chapter. How do we ensure strict consistency in the database? We rely on Database Locking.

Approach 1: Pessimistic Locking

Pessimistic locking assumes conflicts will happen. The system locks the database rows the moment a user begins the checkout process.

BEGIN TRANSACTION;
-- The "FOR UPDATE" clause places an exclusive lock on these rows
SELECT * FROM property_availability
WHERE property_id = 123 AND date BETWEEN '2024-05-01' AND '2024-05-05'
FOR UPDATE;

-- Update the status to RESERVED
UPDATE property_availability SET status = 'RESERVED' WHERE ...;
COMMIT;
  • Pros: Guaranteed accuracy. No one else can touch those dates while the lock is held.
  • Cons: Deadlocks and Throughput bottlenecks. If Alice’s payment takes 3 minutes to process, the rows remain locked. Bob is completely blocked from even looking at those rows for adjacent dates if the index locking is coarse.

Approach 2: Optimistic Locking (The Industry Standard)

Optimistic locking assumes conflicts are rare. Instead of locking rows upfront, it allows concurrent reads and only checks for conflicts at the exact moment of the update using a version number.

Interactive: Optimistic Locking Race Condition

Click "Execute Alice" and "Execute Bob" to simulate a concurrent booking attempt.

User A: Alice
SELECT version FROM availability WHERE id=1;
// Waiting...
Database State
v1
Status: AVAILABLE
User B: Bob
SELECT version FROM availability WHERE id=1;
// Waiting...

The SQL query looks like this:

UPDATE property_availability SET
  status = 'RESERVED',
  version = version + 1
WHERE property_id = 123
  AND date = '2024-05-01'
  AND version = 1; -- The version the user initially read

If Bob’s request arrives milliseconds after Alice’s, the version in the database will now be 2. Bob’s update statement will fail to match any rows (affected_rows = 0). The application sees this, aborts Bob’s transaction, and returns a “Sorry, those dates are no longer available” error.

  • Pros: Exceptionally high throughput. No blocked reads.
  • Cons: In extremely high contention scenarios (e.g., concert tickets going on sale), many requests will fail and require retries or throw errors to users.

[!TIP] Analogy: The Last Slice of Pizza Pessimistic: You place your hand firmly on the pizza slice while deciding if you want it. No one else can touch it until you take it or remove your hand. Optimistic: Everyone lunges for the slice at once. The first hand that grabs it gets it; everyone else grabs empty air and realizes they missed out.


[!NOTE] War Story: The “Ticketmaster” Incident When a massively popular concert went on sale, the initial implementation of the booking system used a naive optimistic locking approach without a queue. The sheer volume of concurrent requests caused 99% of updates to fail due to version conflicts, leading to excessive retries that ultimately DDOS’d the database. The solution was introducing a Kafka-backed queue to serialize the high-contention requests before they even reached the database, alongside the optimistic locking.

7. Distributed Transactions: The Reservation Pattern (PEDALS: Localized Details)

Booking a hotel involves two distinct operations:

  1. Reserving the inventory (Internal Database).
  2. Charging the credit card (External Service like Stripe).

We cannot wrap both in a single ACID transaction because the payment gateway is external. If the payment succeeds but the database update fails (network crash), the user is charged for a room they didn’t get.

We solve this using a distributed transaction model known as the Reservation Pattern (a simplified Saga).

The Workflow:

  1. Pending State: User clicks book. System updates property_availability to RESERVED and creates a reservation record in PENDING status.
  2. Payment Processing: The system calls Stripe API with an Idempotency-Key (e.g., the reservation_id).
  3. Confirmation: If payment succeeds, update status to CONFIRMED and availability to BOOKED.
  4. The Timeout Worker: What if the user closes their browser during payment? A background cron job or message queue with a Delay/TTL constantly scans for PENDING reservations older than 15 minutes. If found, it reverts the inventory back to AVAILABLE.

8. Summary

  • Core Architecture: Split Read (Search via Elasticsearch) and Write (Booking via Postgres).
  • Consistency: Prevent double-booking using Optimistic Locking (version checking) in the primary relational database.
  • Payments: Safely coordinate external payments and internal state using the Reservation Pattern and background TTL workers to release abandoned locks.
  • Idempotency: Always pass unique booking IDs to payment gateways to prevent double charges on retries.

Return to Specialized Systems