Module 11 Review & Cheat Sheet

1. Key Concepts Flashcards

Test your knowledge of Social Media System Design. Click a card to flip it. For detailed definitions, check the System Design Glossary.

What is Fanout-on-Write?

Also called "Push Model". When a user posts, the system pushes the post ID to all followers' home timelines (Redis) immediately. Fast reads, slow writes.

What is the "Celebrity Problem"?

When a user with millions of followers posts, a "Push Model" system must execute millions of write operations instantly, causing a backlog ("Thundering Herd").

Why use WebSockets for Chat?

WebSockets provide a persistent, bidirectional TCP connection. This reduces latency and overhead compared to HTTP Polling (which sends headers every request).

What is EdgeRank?

A simplified ranking algorithm: Rank = Affinity × Weight × Time Decay. It determines the order of posts in a non-chronological feed.

What is a Bloom Filter?

A probabilistic data structure used to test set membership. We use it to efficiently filter out posts a user has already seen (Deduplication).

What is Snowflake ID?

A 64-bit unique ID generator used by Twitter. It is time-sortable (k-ordered) and distributed, avoiding the bottleneck of a single DB auto-increment.

What is Sequence ID?

A strictly increasing integer used in Chats to guarantee message order and sync state across devices, avoiding clock skew issues.

What is Vector Search (ANN)?

A technique to find semantically similar items (e.g., posts) by comparing their embedding vectors. Used for content discovery and recommendations.


2. Cheat Sheet: Fanout Strategies

Feature Push Model (Fanout-on-Write) Pull Model (Fanout-on-Read) Hybrid Model
Action Write to all followers’ caches on post creation. Query DB for all followees’ posts on feed load. Push for normal users, Pull for celebrities.
Write Load Very High (N followers) Low (1 DB write) Balanced
Read Load Low (O(1) Redis fetch) High (Complex DB SQL) Low
Latency Instant Variable/Slow Instant
Best For Twitter (Normal Users) Facebook (Complex Sorting) Instagram/Twitter

3. Tech Stack Summary

Component Technology Why?
Feed Cache Redis Cluster In-memory speed for millions of reads/sec. Stores List or Sorted Set.
Tweet Storage Cassandra High write throughput, linear scalability (Wide Column).
User Graph Graph DB (Neo4j) Efficient traversal of Follows relationships (Adjacency List).
Chat Connection WebSockets Persistent connection for real-time delivery.
Chat History HBase Optimised for range scans (fetch last 50 msgs).
Feed Ranking LightGBM / DNN ML models for probability scoring.
Deduplication Bloom Filter Memory-efficient way to check “Have I seen this?”.
ID Generation Snowflake Distributed, time-sortable unique IDs.

4. Protocol Battle

  • HTTP Polling: Client asks “New msg?” every 1s. High overhead.
  • Long Polling: Client asks, server waits. Better, but setup cost remains.
  • WebSockets: Bi-directional pipe. Best for Chat.
  • Server-Sent Events (SSE): Server pushes to client. Good for Tickers/Feeds, but uni-directional.

5. System Design Decision Matrix

Click a Requirement to see the recommended Solution.

Requirement Solution
Ultra-low Latency Feed Push Model (Fanout-on-Write) to Redis
Complex Feed Sorting Pull Model (Fanout-on-Read) from DB
Real-time Chat WebSockets (Stateful Connection)
Massive Write Volume Cassandra / HBase (LSM Tree)
Unique Sorted IDs Snowflake ID Generator
Deduplication Bloom Filter

[!TIP] Final Interview Tip: Always start with the assumption of a “Push Model” for feeds because reads happen 100x more than writes. Then, pivot to “Hybrid” when the interviewer asks about celebrities.