Kafka: The Infinite Tape Recorder
1. Queue vs Stream
Most developers confuse RabbitMQ (Queue) and Kafka (Stream). They solve different problems.
- RabbitMQ (The Mailbox):
- Model: Smart Broker, Dumb Consumer.
- Behavior: You take a letter out, and it’s gone (deleted).
- Use Case: Task processing, Job Queues (e.g., “Resize this image”).
- Kafka (The Tape Recorder):
- Model: Dumb Broker, Smart Consumer.
- Behavior: Messages are appended to a log. You can listen, rewind, and listen again. Messages stick around for days (retention).
- Use Case: Event Sourcing, Analytics, Metrics, Change Data Capture (CDC).
[!TIP] Kafka is essentially a distributed Write-Ahead Log (WAL). If you understand how a database persists data (see WAL), you understand Kafka.
2. Architecture: The Log
Kafka’s core abstraction is the Log—an append-only sequence of records.
A. Topic & Partition
A Topic is a category (e.g., user_clicks). Since one server can’t hold all data, the Topic is split into Partitions.
- Partition: An ordered, immutable sequence of messages.
- Ordering: Guaranteed only within a partition. Global ordering across partitions is NOT guaranteed.
- Hash Key:
Partition = hash(Key) % NumPartitions. All messages forUser:123go to the same partition (e.g., P0).
B. The Offset
Each message in a partition has a unique ID called the Offset.
- The Consumer tracks what it has read.
- “I’ve read up to Offset 500 in Partition 0.”
C. Consumer Groups (The Secret Sauce)
How do we scale processing? We add consumers to a Consumer Group.
- Rule: A partition is consumed by exactly one consumer in the group.
- Implication: If you have 10 partitions, you can have AT MOST 10 active consumers. The 11th consumer will be idle.
3. Interactive Demo: The Kafka Cluster Simulator
Visualize how Partitions distribute load and how Consumer Groups scale.
- Scenario: High-volume clickstream data.
- Components:
- Producers: Sending events (colored by Key/Partition).
- Brokers: Hosting 3 Partitions (P0, P1, P2).
- Consumer Group: Reading from partitions.
- Action: Add Consumers to see Rebalancing. Watch the “Lag” build up if consumption is slow.
4. Under the Hood: Why is Kafka Fast?
Kafka can handle millions of messages/sec on spinning hard disks. How?
A. Sequential I/O
Random Disk Access (seeking) is slow (~100 seeks/sec). Sequential Access is fast (~100s MB/sec). Kafka only appends to the end of the file. It never seeks.
- Result: Disk speeds approach Network speeds.
B. Zero Copy
Standard File Send:
- Disk -> OS Cache (Kernel)
- OS Cache -> App Buffer (User Space)
- App Buffer -> Socket Buffer (Kernel)
- Socket Buffer -> NIC
Kafka uses the sendfile() system call (Zero Copy):
- Disk -> OS Cache
- OS Cache -> NIC
- Result: No Context Switches, no redundant copying.
5. Durability & Reliability
How do we ensure we never lose data?
A. Replication (ISR)
Each partition has one Leader and multiple Followers.
- Leader: Handles all Reads and Writes.
- Follower: Passively replicates data from the Leader.
- ISR (In-Sync Replicas): The set of followers that are “caught up” with the leader. If a leader dies, only an ISR member can become the new leader.
B. Producer Acks
The producer decides how safe the write must be.
- acks=0 (Fire & Forget): Send and don’t wait. Fastest, but data loss possible if broker crashes immediately.
- acks=1 (Leader Ack): Wait for Leader to write to disk. Safe if Leader stays alive.
- acks=all (Strongest): Wait for Leader AND all ISRs to acknowledge. Zero data loss.
[!TIP] Trade-off:
acks=allis slower (higher latency) but safest. Use it for Payments. Useacks=1for Metrics.
6. Exactly-Once Semantics (EOS)
This is the “Holy Grail” of messaging. How do we ensure a message is processed exactly once, even if the producer retries or the consumer crashes?
1. Idempotent Producer
Kafka assigns a unique ID (PID) and Sequence Number to each producer message.
- If the producer sends
Msg 5twice (due to network timeout), the Broker seesSeq: 5again and discards the duplicate. - This solves duplication during sending.
2. Transactional Messaging (Read-Process-Write)
What if you read from Topic A, process it, and write to Topic B? If you crash in the middle, you might re-process the message. Kafka supports Atomic Transactions across multiple partitions.
- Begin Transaction.
- Consume from A (Offset X).
- Produce to B.
- Commit Transaction.
- Only then is Offset X marked as “Consumed” and the message in B becomes visible to consumers (read_committed).
7. Advanced Concepts
A. Rebalancing
When a consumer joins or leaves, the group “rebalances”.
- Stop-the-World: Traditionally, all consumption stopped during rebalance.
- Incremental: Newer Kafka versions allow partial rebalancing.
B. Log Compaction
Kafka can act as a database. With Log Compaction, Kafka keeps only the latest value for each key.
- Input:
(Key: User1, Value: Alice),(Key: User1, Value: Bob) - Compacted:
(Key: User1, Value: Bob) - Use Case: Restoring application state (e.g., KTable) after a crash without replaying the entire history.
8. Summary
- Kafka is a Log: Think of it as a distributed file system, not a queue.
- Partitions: The unit of scalability.
- Zero Copy: The secret sauce behind its speed.
- Consumer Groups: Enable horizontal scaling of processing.
- Exactly-Once: Possible, but requires configuration (
acks=all, idempotent producer, transactions).