Module 11 Review & Cheat Sheet
1. Key Concepts Flashcards
Test your knowledge of Social Media System Design. Click a card to flip it. For detailed definitions, check the System Design Glossary.
What is Fanout-on-Write?
Also called "Push Model". When a user posts, the system pushes the post ID to all followers' home timelines (Redis) immediately. Fast reads, slow writes.
What is the "Celebrity Problem"?
When a user with millions of followers posts, a "Push Model" system must execute millions of write operations instantly, causing a backlog ("Thundering Herd").
Why use WebSockets for Chat?
WebSockets provide a persistent, bidirectional TCP connection. This reduces latency and overhead compared to HTTP Polling (which sends headers every request).
What is EdgeRank?
A simplified ranking algorithm: Rank = Affinity × Weight × Time Decay. It determines the order of posts in a non-chronological feed.
What is a Bloom Filter?
A probabilistic data structure used to test set membership. We use it to efficiently filter out posts a user has already seen (Deduplication).
What is Snowflake ID?
A 64-bit unique ID generator used by Twitter. It is time-sortable (k-ordered) and distributed, avoiding the bottleneck of a single DB auto-increment.
What is Sequence ID?
A strictly increasing integer used in Chats to guarantee message order and sync state across devices, avoiding clock skew issues.
What is Vector Search (ANN)?
A technique to find semantically similar items (e.g., posts) by comparing their embedding vectors. Used for content discovery and recommendations.
2. Cheat Sheet: Fanout Strategies
| Feature | Push Model (Fanout-on-Write) | Pull Model (Fanout-on-Read) | Hybrid Model |
|---|---|---|---|
| Action | Write to all followers’ caches on post creation. | Query DB for all followees’ posts on feed load. | Push for normal users, Pull for celebrities. |
| Write Load | Very High (N followers) | Low (1 DB write) | Balanced |
| Read Load | Low (O(1) Redis fetch) | High (Complex DB SQL) | Low |
| Latency | Instant | Variable/Slow | Instant |
| Best For | Twitter (Normal Users) | Facebook (Complex Sorting) | Instagram/Twitter |
3. Tech Stack Summary
| Component | Technology | Why? |
|---|---|---|
| Feed Cache | Redis Cluster | In-memory speed for millions of reads/sec. Stores List or Sorted Set. |
| Tweet Storage | Cassandra | High write throughput, linear scalability (Wide Column). |
| User Graph | Graph DB (Neo4j) | Efficient traversal of Follows relationships (Adjacency List). |
| Chat Connection | WebSockets | Persistent connection for real-time delivery. |
| Chat History | HBase | Optimised for range scans (fetch last 50 msgs). |
| Feed Ranking | LightGBM / DNN | ML models for probability scoring. |
| Deduplication | Bloom Filter | Memory-efficient way to check “Have I seen this?”. |
| ID Generation | Snowflake | Distributed, time-sortable unique IDs. |
4. Protocol Battle
- HTTP Polling: Client asks “New msg?” every 1s. High overhead.
- Long Polling: Client asks, server waits. Better, but setup cost remains.
- WebSockets: Bi-directional pipe. Best for Chat.
- Server-Sent Events (SSE): Server pushes to client. Good for Tickers/Feeds, but uni-directional.
5. System Design Decision Matrix
Click a Requirement to see the recommended Solution.
| Requirement | Solution |
|---|---|
| Ultra-low Latency Feed | Push Model (Fanout-on-Write) to Redis |
| Complex Feed Sorting | Pull Model (Fanout-on-Read) from DB |
| Real-time Chat | WebSockets (Stateful Connection) |
| Massive Write Volume | Cassandra / HBase (LSM Tree) |
| Unique Sorted IDs | Snowflake ID Generator |
| Deduplication | Bloom Filter |
[!TIP] Final Interview Tip: Always start with the assumption of a “Push Model” for feeds because reads happen 100x more than writes. Then, pivot to “Hybrid” when the interviewer asks about celebrities.