Module Review: Database Basics
Congratulations on completing Module 4! This is one of the most technical and critical modules in System Design. Let’s solidify your knowledge.
1. Key Concepts Cheat Sheet
| Concept | Definition | Key Trade-off |
|---|---|---|
| SQL (B-Tree) | Relational, Normalized, ACID. | Great for Reads. Slow Writes (Random I/O). Hard to Shard. |
| NoSQL (LSM Tree) | Flexible, Denormalized, BASE. | Great for Writes (Sequential I/O). Weak for complex Reads/Joins. |
| ACID | Atomicity, Consistency, Isolation, Durability. | Safety vs Performance. |
| BASE | Basically Available, Soft state, Eventual consistency. | Availability vs Consistency. |
| Isolation Levels | Read Uncommitted → Serializable. | Preventing anomalies (Dirty, Phantom, Skew) costs speed. |
| WAL | Log changes to disk before RAM. | Durability. Converts Random I/O to Sequential I/O. |
| Vector Clocks | Tracks causality [A:1, B:2]. |
Detects concurrent updates (conflicts) without data loss. |
| MVCC | Multi-Version Concurrency Control. | Readers don’t block Writers. Prevents Dirty Reads implicitly. |
2. Interactive Flashcards
Test yourself. Click a card to flip it.
What is the "Golden Rule" of Distributed Consistency?
(Hint: R, W, N)
R + W > N
If Read Quorum + Write Quorum > Total Nodes, you are guaranteed Strong Consistency.
Why are LSM Trees faster for writes than B-Trees?
Sequential I/O
LSM Trees append to the end of a file (Sequential), avoiding the slow "Seek Time" (Random I/O) of B-Tree rebalancing.
What is a "Phantom Read"?
A Ghost Row
A transaction reads a set of rows matching a criteria (Age > 10). Another transaction inserts a *new* row that matches. Re-reading yields a different count.
What system call guarantees data is physically on the disk?
fsync()
It flushes the OS file system buffer to the physical disk platter, ensuring durability against power loss.
What is "Write Skew"?
Logic Bug in Snapshot Isolation
Two transactions read the same state (e.g., "2 Doctors on call"), make a decision, and update different rows, breaking a global invariant (0 Doctors on call).
How do Vector Clocks solve conflicts?
Causality Tracking
By tracking version counters per node `[A:1, B:1]`, we can detect if updates happened concurrently and ask the app to merge them (instead of blindly overwriting).
When should you use NewSQL?
Scale + ACID
When you need the horizontal scaling of NoSQL but the strict financial correctness (ACID Transactions) of SQL. E.g., Global Payments.
Why is 2PC called the "Availability Killer"?
Blocking Protocol
If the Coordinator crashes, all participants are stuck holding locks indefinitely, freezing the system. Use Sagas instead.
Why are LSM Trees slower for reads?
Read Penalty
The DB must check the MemTable (RAM) and potentially multiple SSTables (Disk) to find the key. Bloom Filters help speed this up.
What is the risk of "Group Commit"?
Data Loss
Transactions are buffered in RAM to increase throughput. If power fails before the buffer flushes to disk, those transactions are lost forever.
3. Final Checklist
Before you move to Module 5 (Load Balancing):
- Can I explain why LSM Trees (Cassandra) handle writes better than B-Trees (SQL)?
- Do I understand why 2PC blocks and is dangerous for availability?
- Can I calculate Quorum (N, R, W) for a system?
- Do I know when to use Optimistic Locking (Version Column) vs Pessimistic Locking?
- Can I explain why WAL converts Random I/O to Sequential I/O?