Module 11 Review & Cheat Sheet

1. Key Concepts Flashcards

Test your knowledge of Social Media System Design. Click a card to flip it. For detailed definitions, check the System Design Glossary.

What is Fanout-on-Write?

Also called "Push Model". When a user posts, the system pushes the post ID to all followers' home timelines (Redis) immediately. Fast reads, slow writes.

What is the "Celebrity Problem"?

When a user with millions of followers posts, a "Push Model" system must execute millions of write operations instantly, causing a backlog ("Thundering Herd").

Why use WebSockets for Chat?

WebSockets provide a persistent, bidirectional TCP connection. This reduces latency and overhead compared to HTTP Polling (which sends headers every request).

What is EdgeRank?

A simplified ranking algorithm: Rank = Affinity × Weight × Time Decay. It determines the order of posts in a non-chronological feed.

What is a Bloom Filter?

A probabilistic data structure used to test set membership. We use it to efficiently filter out posts a user has already seen (Deduplication).

What is Snowflake ID?

A 64-bit unique ID generator used by Twitter. It is time-sortable (k-ordered) and distributed, avoiding the bottleneck of a single DB auto-increment.

What is Sequence ID?

A strictly increasing integer used in Chats to guarantee message order and sync state across devices, avoiding clock skew issues.

What is Vector Search (ANN)?

A technique to find semantically similar items (e.g., posts) by comparing their embedding vectors. Used for content discovery and recommendations.

2. Cheat Sheet: Fanout Strategies

Feature	Push Model (Fanout-on-Write)	Pull Model (Fanout-on-Read)	Hybrid Model
Action	Write to all followers’ caches on post creation.	Query DB for all followees’ posts on feed load.	Push for normal users, Pull for celebrities.
Write Load	Very High (N followers)	Low (1 DB write)	Balanced
Read Load	Low (O(1) Redis fetch)	High (Complex DB SQL)	Low
Latency	Instant	Variable/Slow	Instant
Best For	Twitter (Normal Users)	Facebook (Complex Sorting)	Instagram/Twitter

3. Tech Stack Summary

Component	Technology	Why?
Feed Cache	Redis Cluster	In-memory speed for millions of reads/sec. Stores `List` or `Sorted Set`.
Tweet Storage	Cassandra	High write throughput, linear scalability (Wide Column).
User Graph	Graph DB (Neo4j)	Efficient traversal of `Follows` relationships (Adjacency List).
Chat Connection	WebSockets	Persistent connection for real-time delivery.
Chat History	HBase	Optimised for range scans (fetch last 50 msgs).
Feed Ranking	LightGBM / DNN	ML models for probability scoring.
Deduplication	Bloom Filter	Memory-efficient way to check “Have I seen this?”.
ID Generation	Snowflake	Distributed, time-sortable unique IDs.

4. Protocol Battle

HTTP Polling: Client asks “New msg?” every 1s. High overhead.
Long Polling: Client asks, server waits. Better, but setup cost remains.
WebSockets: Bi-directional pipe. Best for Chat.
Server-Sent Events (SSE): Server pushes to client. Good for Tickers/Feeds, but uni-directional.

5. System Design Decision Matrix

Click a Requirement to see the recommended Solution.

Requirement	Solution
Ultra-low Latency Feed	Push Model (Fanout-on-Write) to Redis
Complex Feed Sorting	Pull Model (Fanout-on-Read) from DB
Real-time Chat	WebSockets (Stateful Connection)
Massive Write Volume	Cassandra / HBase (LSM Tree)
Unique Sorted IDs	Snowflake ID Generator
Deduplication	Bloom Filter

[!TIP] Final Interview Tip: Always start with the assumption of a “Push Model” for feeds because reads happen 100x more than writes. Then, pivot to “Hybrid” when the interviewer asks about celebrities.