Advanced Configuration & Tuning
This module explores the core principles of advanced Kafka configuration, deriving optimal settings from first principles and hardware constraints to build world-class, production-ready expertise.
Once you understand the basics, you need to tune your producer for your specific workload. Are you optimizing for Throughput (Massive logs) or Latency (Real-time trading)?
1. Reliability vs Availability
min.insync.replicas (Broker Setting)
This is the most critical setting for data durability. It defines the minimum number of replicas that must acknowledge a write when acks=all.
- Scenario: Replication Factor = 3,
min.insync.replicas = 2. - Success: 2 or 3 brokers alive.
- Failure: Only 1 broker alive. The producer receives
NotEnoughReplicasException.
If you set min.insync.replicas=2 and you only have 2 brokers, if one goes down, your producer will stop working. This favors Consistency over Availability (CP in CAP theorem).
2. Throughput vs Latency
The two main knobs are batch.size and linger.ms.
The “Bus Station” Analogy
Think of the Kafka Producer as a bus station.
batch.sizeis the capacity of the bus (e.g., 64 seats).linger.msis the maximum time the bus will wait at the station for passengers before leaving, even if it’s not full.
If the bus fills up (batch.size reached) before the time is up (linger.ms), it leaves immediately. If the time is up (linger.ms reached), it leaves regardless of how many passengers are on board.
| Goal | linger.ms |
batch.size |
Result |
|---|---|---|---|
| Low Latency | 0 ms | 16 KB (Default) | Send immediately. High network overhead. |
| High Throughput | 20-100 ms | 64-128 KB | Wait to fill batches. Efficient network usage. |
Compression (compression.type)
Compressing batches reduces network bandwidth but increases CPU usage on the Producer and Broker.
snappy: Low CPU, good compression. (Google)lz4: Extremely fast compression/decompression. Low CPU.zstd: High compression (like Gzip) with low CPU (like Lz4). (Facebook)
3. Interactive: Tuning Simulator
Adjust the slider to see how linger.ms affects Throughput and Latency.
4. Buffer Memory (buffer.memory)
The “Water Tank” Case Study
Imagine the producer’s buffer memory as a water tank. Your application is a hose pouring water into the tank (producing messages), and the network is a pipe draining water out of the tank (sending messages to brokers).
The producer holds messages in heap memory before sending.
buffer.memory: Total bytes of memory the producer can use to buffer records waiting to be sent (Default: 32MB).max.block.ms: If the buffer is full (sender thread is slow),producer.send()will block for this long (Default: 60s).
Scenario: Your application produces 100 MB/sec. Your network can only send 50 MB/sec.
- Buffer fills up in ~0.6 seconds (32MB / 50MB/s surplus).
send()starts blocking.- Application throughput drops to 50 MB/sec (backpressure).
- If network stops completely,
send()throwsTimeoutExceptionafter 60s.
5. Configuration Code
Java
Properties props = new Properties();
// 1. Throughput Tuning
props.put("linger.ms", "20");
props.put("batch.size", Integer.toString(32 * 1024)); // 32KB
props.put("compression.type", "zstd");
// 2. Buffer Memory (Increase for high throughput)
props.put("buffer.memory", Integer.toString(64 * 1024 * 1024)); // 64MB
// 3. Timeout (Don't block forever)
props.put("max.block.ms", "3000"); // 3 seconds
Go
w := &kafka.Writer{
Addr: kafka.TCP("localhost:9092"),
Topic: "analytics",
// Throughput
BatchTimeout: 20 * time.Millisecond,
BatchSize: 1000,
Compression: kafka.Zstd,
// Buffer limits (approximate in kafka-go via batch bytes)
BatchBytes: 1024 * 1024, // 1MB batch limit
// Async writes (non-blocking) are default in kafka-go's Writer
// unless you use WriteMessages explicitly which blocks until success.
Async: false,
}