Design Twitter Timeline
1. What is Twitter?
Twitter is a micro-blogging platform where users post short messages (Tweets). The defining characteristic of Twitter is its real-time nature and its asymmetric follow graph (User A can follow User B without User B following back).
Core Problem
How do you deliver a Tweet from a user to their 50 million followers in under 5 seconds?
[!TIP] Twitter is the classic example of a “Read-Heavy” system where the challenge isn’t storing the data, but distributing it efficiently to millions of timelines simultaneously.
Try it yourself
Go to Twitter (X). Post “Hello System Design”. Refresh your friend’s feed. It appears instantly. Now imagine Elon Musk posts. It appears on 100 million screens instantly. How?
2. Requirements & Goals
Functional Requirements
- Tweet: User can post a tweet (Text + Media).
- Timeline: User can view their Home Timeline (tweets from people they follow).
- Search: Users can search for tweets by keywords.
- Trends: Users can see trending topics.
Non-Functional Requirements
- Read Heavy: The read-to-write ratio is extreme (e.g., 1000:1).
- Low Latency: Timelines must load instantly (< 200ms).
- Eventual Consistency: It’s okay if a follower sees a tweet 2 seconds later.
- Availability: The system must be always available.
3. Capacity Estimation
- DAU: 300 Million.
- Tweets: 500 Million per day.
- Timeline Visits: 30 Billion per day.
- QPS:
- Writes: ~6k QPS (avg), 20k QPS (peak).
- Reads: ~350k QPS.
The massive difference between Reads and Writes dictates that we must optimize for Read.
4. System APIs
| Method | Endpoint | Description |
|---|---|---|
POST |
/v1/tweets |
Create a tweet. Body: { text: "Hello", media_ids: [] } |
POST |
/v1/friendships/create |
Follow a user. |
GET |
/v1/statuses/home_timeline |
Get the logged-in user’s timeline. |
GET |
/v1/search |
Search tweets. Params: q, count. |
5. Database Design
We need a graph database for relationships and a scalable store for Tweets.
1. Tweet Store (Cassandra / Manhattan)
- Twitter built their own DB called Manhattan (labeled in diagram), but we can use Cassandra (Wide Column Store).
- Key:
tweet_id(Snowflake). - Columns:
user_id,content,timestamp,media_urls. - Why?: High write throughput and linear scalability.
2. User Graph (FlockDB / MySQL)
- Stores
follower_id->followee_id. - Needs to answer: “Who follows User A?” (for Fanout) and “Who does User A follow?” (for checking relationships).
- FlockDB: A graph DB built by Twitter to handle adjacency lists efficiently.
3. Timeline Cache (Redis)
- Stores the list of Tweet IDs for each user’s Home Timeline.
- Key:
user_id. - Value:
List<TweetID>(e.g.,[105, 102, 99...]). - Size Limit: We only keep the last 800 IDs.
4. Media Storage (Blobstore / S3)
- Images and Videos are NOT stored in Cassandra.
- They are stored in Blobstore (Twitter’s internal S3).
- Immutability: Media files are immutable. If a user edits a photo, a new ID is generated. This allows aggressive CDN caching.
- Tardis: A dedicated cold storage system for older media.
- CDN: All media is served via CDN to reduce latency.
6. High-Level Design
System architecture handling both Timeline Delivery and Search Indexing.
Validates & Save
Home Timelines
(Scatter-Gather)
Permanent Store
Lists of Post IDs
Inverted Index
7. Deep Dive: Snowflake ID Generator
Twitter created Snowflake to generate unique IDs at scale without coordination. We cannot use auto-incrementing MySQL IDs in a distributed system (Single Point of Failure).
The 64-bit Anatomy
- Sign Bit (1 bit): Always 0 (positive numbers).
- Timestamp (41 bits): Milliseconds since custom epoch (Twitter’s epoch is late 2010).
- 241 ≈ 69 years.
- Data Center ID (5 bits): Allows 32 data centers.
- Worker ID (5 bits): Allows 32 workers per DC. (Total 1024 nodes).
- Coordination: How do we assign Worker IDs? We use Zookeeper. When a Tweet Service node starts, it registers with Zookeeper to get a unique ephemeral ID (0-31). If it crashes, the ID is released.
- Sequence (12 bits): For conflict resolution within the same millisecond.
- 212 = 4096 IDs per millisecond per node.
Benefit: IDs are k-ordered (roughly sorted by time). We can sort by ID to sort by time!
Snowflake ID Generator
See it in action. Click to generate a unique ID based on the current millisecond.
8. Component Design: The Hybrid Approach
Twitter famously moved from a pure “Pull” model to a “Push” model, and finally to a Hybrid Model.
Why Hybrid?
- Push (Fanout-on-Write) is great for 99% of users. When I tweet to my 100 followers, doing 100 Redis writes is trivial.
- Pull is better for celebrities. If Elon Musk tweets, doing 100 Million Redis writes would clog the system and delay delivery.
The Algorithm
- Tweet Arrives: System checks the author’s follower count.
- Case A: Normal User:
- Fetch all followers.
- Push Tweet ID to all their Redis timelines.
- Case B: Celebrity (e.g., >100k followers):
- Save Tweet to DB.
- Do NOT push to followers.
- Timeline Retrieval (Reader Side):
- Fetch the user’s timeline from Redis (Normal Tweets).
- Fetch the list of “Celebrities I Follow”.
- Query the Tweet DB for recent tweets from those celebrities.
- Merge the results in memory.
Thundering Herd Problem
If we used the Push model for celebrities, we would encounter the Thundering Herd problem. Imagine 100 million write operations hitting the Redis cluster simultaneously. This could saturate network bandwidth and lock up the cache nodes. By using the Hybrid approach, we shift this load to the Read path, where we can use caching and replicas more effectively.
9. Search & Trends Infrastructure
Search (Earlybird)
Twitter’s search system is a modified version of Lucene, called Earlybird (as shown in the System Architecture diagram).
- Inverted Index: Maps tokens to Tweet IDs.
("java" -> [101, 105, 109]). - Segments: Data is written to immutable “Segment” files.
- Writes: New tweets go into a small in-memory buffer, then flushed to a small segment on disk.
- Compaction: Background threads merge small segments into larger ones (Log-Structured Merge approach) to optimize read speed.
- Partioning Strategy:
- Time-Sliced: Instead of sharding by Tweet ID, indices are split by time (e.g., “Last 4 Hours”, “Yesterday”, “Last Week”).
- Why?: Users mostly search for recent events. We can query the “Recent” shard first and ignore the “Old” shards unless necessary.
- Blender: A scatter-gather service (visualized in diagram). When you search “Super Bowl”:
- Blender queries all Earlybird shards in parallel.
- Aggregates results.
- Ranks them (Relevance + Time).
Trends (Streaming Aggregation)
- Stream Processing: We use Apache Storm or Flink.
- Windowing: Count occurrences of hashtags in a Sliding Window (e.g., last 15 mins).
- Ranking: Sort by velocity (acceleration of mentions), not just total volume.
10. Data Partitioning
Sharding Timelines (Redis)
- Sharded by User ID of the timeline owner.
hash(user_id) % number_of_redis_nodes.- We use Consistent Hashing to handle node additions/removals without rehashing everything. See Consistent Hashing.
11. Interactive Decision Visualizer
Hybrid Delivery Simulator
This simulation shows how Twitter routes tweets differently based on the author’s influence.
- Normal User: Tweets take the “Fast Path” (Push to Cache).
- Celebrity: Tweets take the “Storage Path” (Pull from DB).
Tweet Routing Logic
12. Requirements Traceability
| Requirement | Design Decision | Justification |
|---|---|---|
| Instant Delivery | Push Model | For 99% of users, reading from a pre-computed Redis list is fast. |
| Scalability | Hybrid Model | Pull model for celebrities prevents “Thundering Herd” on cache. |
| Searchability | Earlybird | Dedicated Search Index (Lucene) updated in near real-time. |
| Availability | Manhattan/Cassandra | Replicated, eventually consistent DB ensures no downtime. |
| Sorting | Snowflake ID | K-ordered IDs allow sorting by Time without secondary indexes. |
13. Observability (RED Method)
- Rate: QPS for Write Tweet vs Read Timeline.
- Errors: Failed Tweets (DLQ depth).
- Duration: P99 Latency for Home Timeline render.
Key Metrics:
delivery_latency_ms: Time from “Tweet Sent” to “Available in Follower’s Redis”.search_index_lag: Time for a tweet to appear in search results.
14. Deployment Strategy
- Dark Launching: New features (e.g., Edit Tweet) are deployed but hidden behind feature flags.
- Decoupled Release: Client apps (iOS/Android) are updated separately from Backend services. We must support backward compatibility for APIs.
- Schema Evolution: Using Protobuf/Thrift allows adding fields without breaking old clients.
15. Interview Gauntlet
Q1: Why not use SQL for Tweets?
- Answer: SQL struggles with the sheer volume of writes (20k/sec) and schema flexibility. Cassandra/Manhattan allows infinite horizontal scaling for write-heavy workloads.
Q2: How do we handle “Retweets”?
- Answer: A Retweet is a pointer. In the Redis list, we store
{ type: 'RT', original_id: 101, retweeter_id: 202 }. The frontend renders it differently.
Q3: How do we detect Trending Topics?
- Answer: Using a Stream Processor (Flink/Storm). We use a “Sliding Window” (e.g., last 15 mins) and count hashtag frequencies. We rank by acceleration (velocity), not just total volume, to capture breaking news.
16. Whiteboard Summary
Write Path
- Fanout: Push for small users.
- Hybrid: Pull for stars (>100k).
- Storage: Manhattan (Cassandra).
Read Path
- Cache: Redis Cluster (Home).
- Merge: Cache + DB (Celeb).
- Search: Earlybird (Lucene).