Module Review: Producers

[!NOTE] This module explores the core principles of Kafka Producers, deriving solutions from first principles and hardware constraints to build world-class, production-ready expertise.

1. Key Takeaways

  • Asynchronous Nature: The producer’s send() method is async. It batches messages in the RecordAccumulator and a background Sender Thread transmits them.
  • Partitioning:
  • Key-Based: Hashes the key (Murmur2) to ensure ordering per key.
  • Sticky: Sticks to one partition to fill batches, improving throughput.
  • Reliability:
  • acks=all: Ensures all ISRs have the data.
  • min.insync.replicas: Enforces the minimum replication for acks=all.
  • enable.idempotence=true: Prevents duplicates and ensures ordering per partition.
  • Tuning:
  • linger.ms > 0: Wait for batches to fill (High Throughput).
  • linger.ms = 0: Send immediately (Low Latency).
  • compression.type: Use lz4 or zstd to save bandwidth.

2. Flashcards

Test your knowledge. Click a card to flip it.

What is the benefit of the Sticky Partitioner?

(Click to reveal)

It reduces latency and increases throughput by sticking to one partition until a batch is full, minimizing the number of requests sent to brokers.

What does `min.insync.replicas=2` protect against?

It ensures that a write is only successful if at least 2 replicas acknowledge it. This prevents data loss if one replica fails later.

Why should you enable Idempotence?

It guarantees exactly-once delivery per partition by assigning sequence numbers to messages, preventing duplicates during retries.

What happens if `buffer.memory` fills up?

The `send()` method blocks for `max.block.ms`. If it still can't clear space, it throws a TimeoutException.


3. Cheat Sheet

Parameter Recommended Value (General) Why?
acks all Durability.
enable.idempotence true Prevent duplicates.
retries Integer.MAX_VALUE Don’t give up on transient errors.
linger.ms 20-100 Improve throughput via batching.
batch.size 32768 (32KB) Larger batches = better compression.
compression.type lz4 or zstd Save network bandwidth.
buffer.memory 33554432 (32MB) Buffer backpressure. Increase if needed.

4. Quick Revision

  • send() is asynchronous and adds records to the Record Accumulator.
  • Sticky Partitioner improves throughput by filling batches before switching partitions.
  • acks=all guarantees data safety as long as one ISR is alive.
  • Idempotence prevents duplicates caused by retries.
  • linger.ms and batch.size are the keys to optimizing throughput.

5. Next Steps

Learn how to read data at scale with Consumer Groups.

View Full Glossary