Design Twitter Timeline

1. What is Twitter?

Twitter is a micro-blogging platform where users post short messages (Tweets). The defining characteristic of Twitter is its real-time nature and its asymmetric follow graph (User A can follow User B without User B following back).

Core Problem

How do you deliver a Tweet from a user to their 50 million followers in under 5 seconds?

[!TIP] Twitter is the classic example of a “Read-Heavy” system where the challenge isn’t storing the data, but distributing it efficiently to millions of timelines simultaneously.

Try it yourself

Go to Twitter (X). Post “Hello System Design”. Refresh your friend’s feed. It appears instantly. Now imagine Elon Musk posts. It appears on 100 million screens instantly. How?


2. Requirements & Goals

Functional Requirements

  • Tweet: User can post a tweet (Text + Media).
  • Timeline: User can view their Home Timeline (tweets from people they follow).
  • Search: Users can search for tweets by keywords.
  • Trends: Users can see trending topics.

Non-Functional Requirements

  • Read Heavy: The read-to-write ratio is extreme (e.g., 1000:1).
  • Low Latency: Timelines must load instantly (< 200ms).
  • Eventual Consistency: It’s okay if a follower sees a tweet 2 seconds later.
  • Availability: The system must be always available.

3. Capacity Estimation

  • DAU: 300 Million.
  • Tweets: 500 Million per day.
  • Timeline Visits: 30 Billion per day.
  • QPS:
    • Writes: ~6k QPS (avg), 20k QPS (peak).
    • Reads: ~350k QPS.

The massive difference between Reads and Writes dictates that we must optimize for Read.


4. System APIs

Method Endpoint Description
POST /v1/tweets Create a tweet. Body: { text: "Hello", media_ids: [] }
POST /v1/friendships/create Follow a user.
GET /v1/statuses/home_timeline Get the logged-in user’s timeline.
GET /v1/search Search tweets. Params: q, count.

5. Database Design

We need a graph database for relationships and a scalable store for Tweets.

1. Tweet Store (Cassandra / Manhattan)

  • Twitter built their own DB called Manhattan (labeled in diagram), but we can use Cassandra (Wide Column Store).
  • Key: tweet_id (Snowflake).
  • Columns: user_id, content, timestamp, media_urls.
  • Why?: High write throughput and linear scalability.

2. User Graph (FlockDB / MySQL)

  • Stores follower_id -> followee_id.
  • Needs to answer: “Who follows User A?” (for Fanout) and “Who does User A follow?” (for checking relationships).
  • FlockDB: A graph DB built by Twitter to handle adjacency lists efficiently.

3. Timeline Cache (Redis)

  • Stores the list of Tweet IDs for each user’s Home Timeline.
  • Key: user_id.
  • Value: List<TweetID> (e.g., [105, 102, 99...]).
  • Size Limit: We only keep the last 800 IDs.

4. Media Storage (Blobstore / S3)

  • Images and Videos are NOT stored in Cassandra.
  • They are stored in Blobstore (Twitter’s internal S3).
  • Immutability: Media files are immutable. If a user edits a photo, a new ID is generated. This allows aggressive CDN caching.
  • Tardis: A dedicated cold storage system for older media.
  • CDN: All media is served via CDN to reduce latency.

6. High-Level Design

System architecture handling both Timeline Delivery and Search Indexing.

System Architecture: Twitter Timeline & Search
Fanout Service | Manhattan DB | Earlybird Search
Write Tweet Path
Read Timeline
Search Query
Client
Services
Storage & Search
User
Load Balancer
Tweet Service
Accepts Writes
Validates & Save
Fanout Service
Fanout-on-Write
Home Timelines
Timeline Service
Reads Home Feed
Blender
Search Aggregator
(Scatter-Gather)
Manhattan
(Cassandra)
Permanent Store
Timeline Cache
Redis Cluster
Lists of Post IDs
Earlybird
Lucene Shards
Inverted Index
Push Index

7. Deep Dive: Snowflake ID Generator

Twitter created Snowflake to generate unique IDs at scale without coordination. We cannot use auto-incrementing MySQL IDs in a distributed system (Single Point of Failure).

The 64-bit Anatomy

  • Sign Bit (1 bit): Always 0 (positive numbers).
  • Timestamp (41 bits): Milliseconds since custom epoch (Twitter’s epoch is late 2010).
    • 241 &approx; 69 years.
  • Data Center ID (5 bits): Allows 32 data centers.
  • Worker ID (5 bits): Allows 32 workers per DC. (Total 1024 nodes).
    • Coordination: How do we assign Worker IDs? We use Zookeeper. When a Tweet Service node starts, it registers with Zookeeper to get a unique ephemeral ID (0-31). If it crashes, the ID is released.
  • Sequence (12 bits): For conflict resolution within the same millisecond.
    • 212 = 4096 IDs per millisecond per node.
0
Timestamp (41)
Machine (10)
Sequence (12)

Benefit: IDs are k-ordered (roughly sorted by time). We can sort by ID to sort by time!

Snowflake ID Generator

See it in action. Click to generate a unique ID based on the current millisecond.

0000000000000000000
0 00000... 0000000000 000000000000

8. Component Design: The Hybrid Approach

Twitter famously moved from a pure “Pull” model to a “Push” model, and finally to a Hybrid Model.

Why Hybrid?

  • Push (Fanout-on-Write) is great for 99% of users. When I tweet to my 100 followers, doing 100 Redis writes is trivial.
  • Pull is better for celebrities. If Elon Musk tweets, doing 100 Million Redis writes would clog the system and delay delivery.

The Algorithm

  1. Tweet Arrives: System checks the author’s follower count.
  2. Case A: Normal User:
    • Fetch all followers.
    • Push Tweet ID to all their Redis timelines.
  3. Case B: Celebrity (e.g., >100k followers):
    • Save Tweet to DB.
    • Do NOT push to followers.
  4. Timeline Retrieval (Reader Side):
    • Fetch the user’s timeline from Redis (Normal Tweets).
    • Fetch the list of “Celebrities I Follow”.
    • Query the Tweet DB for recent tweets from those celebrities.
    • Merge the results in memory.

Thundering Herd Problem

If we used the Push model for celebrities, we would encounter the Thundering Herd problem. Imagine 100 million write operations hitting the Redis cluster simultaneously. This could saturate network bandwidth and lock up the cache nodes. By using the Hybrid approach, we shift this load to the Read path, where we can use caching and replicas more effectively.


Search (Earlybird)

Twitter’s search system is a modified version of Lucene, called Earlybird (as shown in the System Architecture diagram).

  • Inverted Index: Maps tokens to Tweet IDs. ("java" -> [101, 105, 109]).
  • Segments: Data is written to immutable “Segment” files.
    • Writes: New tweets go into a small in-memory buffer, then flushed to a small segment on disk.
    • Compaction: Background threads merge small segments into larger ones (Log-Structured Merge approach) to optimize read speed.
  • Partioning Strategy:
    • Time-Sliced: Instead of sharding by Tweet ID, indices are split by time (e.g., “Last 4 Hours”, “Yesterday”, “Last Week”).
    • Why?: Users mostly search for recent events. We can query the “Recent” shard first and ignore the “Old” shards unless necessary.
  • Blender: A scatter-gather service (visualized in diagram). When you search “Super Bowl”:
    1. Blender queries all Earlybird shards in parallel.
    2. Aggregates results.
    3. Ranks them (Relevance + Time).
  • Stream Processing: We use Apache Storm or Flink.
  • Windowing: Count occurrences of hashtags in a Sliding Window (e.g., last 15 mins).
  • Ranking: Sort by velocity (acceleration of mentions), not just total volume.

10. Data Partitioning

Sharding Timelines (Redis)

  • Sharded by User ID of the timeline owner.
  • hash(user_id) % number_of_redis_nodes.
  • We use Consistent Hashing to handle node additions/removals without rehashing everything. See Consistent Hashing.

11. Interactive Decision Visualizer

Hybrid Delivery Simulator

This simulation shows how Twitter routes tweets differently based on the author’s influence.

  • Normal User: Tweets take the “Fast Path” (Push to Cache).
  • Celebrity: Tweets take the “Storage Path” (Pull from DB).

Tweet Routing Logic

User
Redis Cache (Timeline)
Cassandra (Storage)

12. Requirements Traceability

Requirement Design Decision Justification
Instant Delivery Push Model For 99% of users, reading from a pre-computed Redis list is fast.
Scalability Hybrid Model Pull model for celebrities prevents “Thundering Herd” on cache.
Searchability Earlybird Dedicated Search Index (Lucene) updated in near real-time.
Availability Manhattan/Cassandra Replicated, eventually consistent DB ensures no downtime.
Sorting Snowflake ID K-ordered IDs allow sorting by Time without secondary indexes.

13. Observability (RED Method)

  • Rate: QPS for Write Tweet vs Read Timeline.
  • Errors: Failed Tweets (DLQ depth).
  • Duration: P99 Latency for Home Timeline render.

Key Metrics:

  • delivery_latency_ms: Time from “Tweet Sent” to “Available in Follower’s Redis”.
  • search_index_lag: Time for a tweet to appear in search results.

14. Deployment Strategy

  • Dark Launching: New features (e.g., Edit Tweet) are deployed but hidden behind feature flags.
  • Decoupled Release: Client apps (iOS/Android) are updated separately from Backend services. We must support backward compatibility for APIs.
  • Schema Evolution: Using Protobuf/Thrift allows adding fields without breaking old clients.

15. Interview Gauntlet

Q1: Why not use SQL for Tweets?

  • Answer: SQL struggles with the sheer volume of writes (20k/sec) and schema flexibility. Cassandra/Manhattan allows infinite horizontal scaling for write-heavy workloads.

Q2: How do we handle “Retweets”?

  • Answer: A Retweet is a pointer. In the Redis list, we store { type: 'RT', original_id: 101, retweeter_id: 202 }. The frontend renders it differently.

Q3: How do we detect Trending Topics?

  • Answer: Using a Stream Processor (Flink/Storm). We use a “Sliding Window” (e.g., last 15 mins) and count hashtag frequencies. We rank by acceleration (velocity), not just total volume, to capture breaking news.

16. Whiteboard Summary

Write Path

  • Fanout: Push for small users.
  • Hybrid: Pull for stars (>100k).
  • Storage: Manhattan (Cassandra).

Read Path

  • Cache: Redis Cluster (Home).
  • Merge: Cache + DB (Celeb).
  • Search: Earlybird (Lucene).