Module Review: Deep Dive Infra

[!TIP] The Foundation: You now understand the systems that store (GFS), process (HDFS), stream (Kafka), and coordinate (Chubby) the internet’s data. These are the building blocks for everything else.

1. Interactive Flashcards

Test your knowledge. Click a card to flip it.

What is the "Relaxed Consistency" model in GFS?
Consistent but Undefined. If a write fails on one replica, the file region might contain inconsistent data or garbage padding. GFS guarantees atomic **Record Appends** (at-least-once) but not bit-perfect identity across replicas during failures.
How does HDFS 3.0 reduce storage cost?
**Erasure Coding (RS(6,3))**. Instead of 3x replication (200% overhead), it splits data into 6 chunks + 3 parity chunks (50% overhead). It uses **Reed-Solomon** math to reconstruct any 3 lost blocks from the remaining 6.
Why is Kafka's "Zero Copy" faster?
It uses the `sendfile` syscall to transfer data directly from **Disk Cache** to the **NIC Buffer**, bypassing the Application (JVM) memory. This reduces **Context Switches** from 4 to 2 and Data Copies from 4 to 2.
What is a "High Watermark" in Kafka?
The offset of the last message that has been successfully replicated to all **In-Sync Replicas (ISR)**. Consumers can only read up to this point to ensure they don't see data that might be lost if the leader fails.
What is a "Fencing Token" in Chubby?
A monotonic sequence number (e.g., Epoch ID) included with every lock. If a "Zombie Master" wakes up and tries to write to storage with an old token (e.g., 10), the storage rejects it because it has already seen a higher token (e.g., 11) from the new leader.
What is "Rack Awareness"?
A placement strategy (HDFS/Kafka) that ensures replicas are spread across different racks to survive a **Top-of-Rack Switch Failure**. (e.g., 1 Local, 1 Remote Rack, 1 Same Remote Rack).
What is the RED Method?
The gold standard for monitoring microservices: **Rate** (Requests/sec), **Errors** (Failed requests/sec), and **Duration** (Latency P99). It focuses on the end-user experience.
Paxos vs Raft?
**Paxos** is a general consensus family (hard to implement, can be leaderless). **Raft** is designed for understandability and enforces a Strong Leader model. Both guarantee the same consistency (Linearizability).
What is a Write-Ahead Log (WAL)?
The standard pattern for database **Durability**. Modifications are written to an append-only log on disk *before* they are applied to the in-memory state. If the node crashes, it replays the WAL to recover.
What is "Split Brain"?
A failure mode in distributed clusters where two nodes both believe they are the **Leader** (Master) due to a network partition. It leads to data corruption unless **Fencing** (e.g., Epoch IDs) is used.

2. Cheat Sheet: The Big Four

System Role Key Abstraction Consistency Bottleneck Mitigation
GFS File Storage Files & Chunks (64MB) Relaxed (Atomic Append) Shadow Masters (Read-only)
HDFS Big Data Storage Blocks (128MB) Strict (Write-Once) Federation (Multiple NameNodes)
Kafka Event Streaming Log & Topic Configurable (ISR) Partitioning (Horizontal Scale)
Chubby Distributed Lock File & Session Strict (Paxos) Caching (Client-side Invalidation)

Quick Comparisons

  • GFS vs HDFS: GFS allows random writes (dangerous); HDFS enforces Write-Once-Read-Many (safer).
  • RabbitMQ vs Kafka: RabbitMQ pushes messages (smart broker); Kafka lets clients pull (smart client, dumb broker).
  • Chubby vs ZooKeeper: Chubby is a File System (heavy caching); ZooKeeper is a Directory Service (high throughput).

3. Interview Red Flags

[!WARNING] Don’t say these in an interview!

  1. “GFS is strongly consistent.” (No, it’s defined but relaxed. It has duplicates).
  2. “Kafka is a Queue.” (No, it’s a Log. Messages aren’t deleted after consumption).
  3. “Zookeeper is a database.” (No, it’s a coordination service. Don’t store large data in it).
  4. “We use RAID for HDFS.” (No, we use Replication or Erasure Coding. RAID is hardware, HDFS is software).

Next Module: Ops Excellence