Design Twitter Timeline

1. What is Twitter?

Twitter is a micro-blogging platform where users post short messages (Tweets). The defining characteristic of Twitter is its real-time nature and its asymmetric follow graph (User A can follow User B without User B following back).

Core Problem

How do you deliver a Tweet from a user to their 50 million followers in under 5 seconds?

[!TIP] Twitter is the classic example of a “Read-Heavy” system where the challenge isn’t storing the data, but distributing it efficiently to millions of timelines simultaneously.

Try it yourself

Go to Twitter (X). Post “Hello System Design”. Refresh your friend’s feed. It appears instantly. Now imagine Elon Musk posts. It appears on 100 million screens instantly. How?

2. Requirements & Goals

Functional Requirements

Tweet: User can post a tweet (Text + Media).
Timeline: User can view their Home Timeline (tweets from people they follow).
Search: Users can search for tweets by keywords.
Trends: Users can see trending topics.

Non-Functional Requirements

Read Heavy: The read-to-write ratio is extreme (e.g., 1000:1).
Low Latency: Timelines must load instantly (< 200ms).
Eventual Consistency: It’s okay if a follower sees a tweet 2 seconds later.
Availability: The system must be always available.

3. Capacity Estimation

DAU: 300 Million.
Tweets: 500 Million per day.
Timeline Visits: 30 Billion per day.
QPS:
- Writes: ~6k QPS (avg), 20k QPS (peak).
- Reads: ~350k QPS.

The massive difference between Reads and Writes dictates that we must optimize for Read.

4. System APIs

Method	Endpoint	Description
`POST`	`/v1/tweets`	Create a tweet. Body: `{ text: "Hello", media_ids: [] }`
`POST`	`/v1/friendships/create`	Follow a user.
`GET`	`/v1/statuses/home_timeline`	Get the logged-in user’s timeline.
`GET`	`/v1/search`	Search tweets. Params: `q`, `count`.

5. Database Design

We need a graph database for relationships and a scalable store for Tweets.

1. Tweet Store (Cassandra / Manhattan)

Twitter built their own DB called Manhattan (labeled in diagram), but we can use Cassandra (Wide Column Store).
Key: tweet_id (Snowflake).
Columns: user_id, content, timestamp, media_urls.
Why?: High write throughput and linear scalability.

2. User Graph (FlockDB / MySQL)

Stores follower_id -> followee_id.
Needs to answer: “Who follows User A?” (for Fanout) and “Who does User A follow?” (for checking relationships).
FlockDB: A graph DB built by Twitter to handle adjacency lists efficiently.

3. Timeline Cache (Redis)

Stores the list of Tweet IDs for each user’s Home Timeline.
Key: user_id.
Value: List<TweetID> (e.g., [105, 102, 99...]).
Size Limit: We only keep the last 800 IDs.

4. Media Storage (Blobstore / S3)

Images and Videos are NOT stored in Cassandra.
They are stored in Blobstore (Twitter’s internal S3).
Immutability: Media files are immutable. If a user edits a photo, a new ID is generated. This allows aggressive CDN caching.
Tardis: A dedicated cold storage system for older media.
CDN: All media is served via CDN to reduce latency.

6. High-Level Design

System architecture handling both Timeline Delivery and Search Indexing.

System Architecture: Twitter Timeline & Search

Fanout Service | Manhattan DB | Earlybird Search

Write Tweet Path

Read Timeline

Search Query

Client

Services

Storage & Search

User

Load Balancer

Tweet Service

Accepts Writes
Validates & Save

Fanout Service

Fanout-on-Write
Home Timelines

Timeline Service

Reads Home Feed

Blender

Search Aggregator
(Scatter-Gather)

Manhattan

(Cassandra)
Permanent Store

Timeline Cache

Redis Cluster
Lists of Post IDs

Earlybird

Lucene Shards
Inverted Index

7. Deep Dive: Snowflake ID Generator

Twitter created Snowflake to generate unique IDs at scale without coordination. We cannot use auto-incrementing MySQL IDs in a distributed system (Single Point of Failure).

The 64-bit Anatomy

Sign Bit (1 bit): Always 0 (positive numbers).
Timestamp (41 bits): Milliseconds since custom epoch (Twitter’s epoch is late 2010).
- 2⁴¹ ≈ 69 years.
Data Center ID (5 bits): Allows 32 data centers.
Worker ID (5 bits): Allows 32 workers per DC. (Total 1024 nodes).
- Coordination: How do we assign Worker IDs? We use Zookeeper. When a Tweet Service node starts, it registers with Zookeeper to get a unique ephemeral ID (0-31). If it crashes, the ID is released.
Sequence (12 bits): For conflict resolution within the same millisecond.
- 2¹² = 4096 IDs per millisecond per node.

0
Timestamp (41)
Machine (10)
Sequence (12)

Benefit: IDs are k-ordered (roughly sorted by time). We can sort by ID to sort by time!

Snowflake ID Generator

See it in action. Click to generate a unique ID based on the current millisecond.

        0000000000000000000
    

        0
        00000...
        0000000000
        000000000000
    

8. Component Design: The Hybrid Approach

Twitter famously moved from a pure “Pull” model to a “Push” model, and finally to a Hybrid Model.

Why Hybrid?

Push (Fanout-on-Write) is great for 99% of users. When I tweet to my 100 followers, doing 100 Redis writes is trivial.
Pull is better for celebrities. If Elon Musk tweets, doing 100 Million Redis writes would clog the system and delay delivery.

The Algorithm

Tweet Arrives: System checks the author’s follower count.
Case A: Normal User:
- Fetch all followers.
- Push Tweet ID to all their Redis timelines.
Case B: Celebrity (e.g., >100k followers):
- Save Tweet to DB.
- Do NOT push to followers.
Timeline Retrieval (Reader Side):
- Fetch the user’s timeline from Redis (Normal Tweets).
- Fetch the list of “Celebrities I Follow”.
- Query the Tweet DB for recent tweets from those celebrities.
- Merge the results in memory.

Thundering Herd Problem

If we used the Push model for celebrities, we would encounter the Thundering Herd problem. Imagine 100 million write operations hitting the Redis cluster simultaneously. This could saturate network bandwidth and lock up the cache nodes. By using the Hybrid approach, we shift this load to the Read path, where we can use caching and replicas more effectively.

9. Search & Trends Infrastructure

Search (Earlybird)

Twitter’s search system is a modified version of Lucene, called Earlybird (as shown in the System Architecture diagram).

Inverted Index: Maps tokens to Tweet IDs. ("java" -> [101, 105, 109]).
Segments: Data is written to immutable “Segment” files.
- Writes: New tweets go into a small in-memory buffer, then flushed to a small segment on disk.
- Compaction: Background threads merge small segments into larger ones (Log-Structured Merge approach) to optimize read speed.
Partioning Strategy:
- Time-Sliced: Instead of sharding by Tweet ID, indices are split by time (e.g., “Last 4 Hours”, “Yesterday”, “Last Week”).
- Why?: Users mostly search for recent events. We can query the “Recent” shard first and ignore the “Old” shards unless necessary.
Blender: A scatter-gather service (visualized in diagram). When you search “Super Bowl”:
1. Blender queries all Earlybird shards in parallel.
2. Aggregates results.
3. Ranks them (Relevance + Time).

Trends (Streaming Aggregation)

Stream Processing: We use Apache Storm or Flink.
Windowing: Count occurrences of hashtags in a Sliding Window (e.g., last 15 mins).
Ranking: Sort by velocity (acceleration of mentions), not just total volume.

10. Data Partitioning

Sharding Timelines (Redis)

Sharded by User ID of the timeline owner.
hash(user_id) % number_of_redis_nodes.
We use Consistent Hashing to handle node additions/removals without rehashing everything. See Consistent Hashing.

11. Interactive Decision Visualizer

Hybrid Delivery Simulator

This simulation shows how Twitter routes tweets differently based on the author’s influence.

Normal User: Tweets take the “Fast Path” (Push to Cache).
Celebrity: Tweets take the “Storage Path” (Pull from DB).

Tweet Routing Logic

User

Redis Cache (Timeline)

Cassandra (Storage)

12. Requirements Traceability

Requirement	Design Decision	Justification
Instant Delivery	Push Model	For 99% of users, reading from a pre-computed Redis list is fast.
Scalability	Hybrid Model	Pull model for celebrities prevents “Thundering Herd” on cache.
Searchability	Earlybird	Dedicated Search Index (Lucene) updated in near real-time.
Availability	Manhattan/Cassandra	Replicated, eventually consistent DB ensures no downtime.
Sorting	Snowflake ID	K-ordered IDs allow sorting by Time without secondary indexes.

13. Observability (RED Method)

Rate: QPS for Write Tweet vs Read Timeline.
Errors: Failed Tweets (DLQ depth).
Duration: P99 Latency for Home Timeline render.

Key Metrics:

delivery_latency_ms: Time from “Tweet Sent” to “Available in Follower’s Redis”.
search_index_lag: Time for a tweet to appear in search results.

14. Deployment Strategy

Dark Launching: New features (e.g., Edit Tweet) are deployed but hidden behind feature flags.
Decoupled Release: Client apps (iOS/Android) are updated separately from Backend services. We must support backward compatibility for APIs.
Schema Evolution: Using Protobuf/Thrift allows adding fields without breaking old clients.

15. Interview Gauntlet

Q1: Why not use SQL for Tweets?

Answer: SQL struggles with the sheer volume of writes (20k/sec) and schema flexibility. Cassandra/Manhattan allows infinite horizontal scaling for write-heavy workloads.

Q2: How do we handle “Retweets”?

Answer: A Retweet is a pointer. In the Redis list, we store { type: 'RT', original_id: 101, retweeter_id: 202 }. The frontend renders it differently.

Q3: How do we detect Trending Topics?

Answer: Using a Stream Processor (Flink/Storm). We use a “Sliding Window” (e.g., last 15 mins) and count hashtag frequencies. We rank by acceleration (velocity), not just total volume, to capture breaking news.

Design Twitter Timeline

Design Twitter Timeline

1. What is Twitter?

Core Problem

Try it yourself

2. Requirements & Goals

Functional Requirements

Non-Functional Requirements

3. Capacity Estimation

4. System APIs

5. Database Design

6. High-Level Design

7. Deep Dive: Snowflake ID Generator

The 64-bit Anatomy

Snowflake ID Generator

8. Component Design: The Hybrid Approach

Why Hybrid?

The Algorithm

Thundering Herd Problem

9. Search & Trends Infrastructure

Search (Earlybird)

Trends (Streaming Aggregation)

10. Data Partitioning

Sharding Timelines (Redis)

11. Interactive Decision Visualizer

Hybrid Delivery Simulator

Tweet Routing Logic

12. Requirements Traceability

13. Observability (RED Method)

14. Deployment Strategy

15. Interview Gauntlet

16. Whiteboard Summary

Write Path

Read Path