Kafka message compression

Kafka producer compression reduces the size of messages before sending them to brokers, improving throughput and reducing network bandwidth at the cost of CPU overhead.

Why use compression?

Compression provides several benefits in Kafka deployments:

Reduced network usage: Smaller message sizes mean less data transmitted
Lower storage costs: Compressed messages take less disk space on brokers
Improved throughput: More messages can fit in each batch
Better performance: Particularly beneficial for text-heavy payloads like JSON or XML

Compression is most effective with:

Text-based formats (JSON, XML, CSV)
Repetitive data patterns
Large message payloads (> 1KB)

Compression algorithms

Kafka supports four compression algorithms, each with different performance characteristics:

GZIP

General-purpose compression with high compression ratios but higher CPU usage. Characteristics:

Highest compression ratio (smallest message size)
Slowest compression/decompression speed
Good for scenarios where network bandwidth is limited
CPU-intensive

Configuration:

compression.type=gzip

LZ4

Fast compression with moderate compression ratios, good balance of speed and size reduction. Characteristics:

Very fast compression/decompression
Moderate compression ratio
Low CPU overhead
Good general-purpose choice

Configuration:

compression.type=lz4

Snappy

Fast compression optimized for speed with reasonable compression ratios. Characteristics:

Fast compression/decompression (slightly slower than LZ4)
Moderate compression ratio
Good CPU efficiency
Popular choice for high-throughput scenarios

Configuration:

compression.type=snappy

ZSTD (Zstandard)

Modern compression algorithm offering excellent compression ratios with good performance. Characteristics:

Excellent compression ratio (better than LZ4/Snappy, approaches GZIP)
Good compression/decompression speed
Relatively new (available since Kafka 2.1)
Tunable compression levels

Configuration:

compression.type=zstd

Performance comparison

Compression ratio comparison

Based on typical JSON payloads:

Algorithm	Compression ratio	CPU usage	Speed
GZIP	~75%	High	Slow
LZ4	~65%	Low	Fast
Snappy	~68%	Low	Fast
ZSTD	~72%	Medium	Medium

Throughput impact

Compression affects producer throughput in different ways: Positive impacts:

Faster network transmission due to smaller message sizes
Higher effective batch sizes (more messages per batch)
Reduced broker disk I/O

Negative impacts:

CPU overhead for compression on producer side
Additional latency for compression process

Configuration and tuning

Basic compression configuration

# Enable compression for all messages
compression.type=lz4

# Compression happens at batch level
batch.size=16384
linger.ms=5

Compression at different levels

Compression can be configured at multiple levels: Producer level (affects all topics):

Properties props = new Properties();
props.put("compression.type", "lz4");
Producer<String, String> producer = new KafkaProducer<>(props);

Topic level (broker-side configuration):

kafka-configs.sh --bootstrap-server localhost:9092 \
  --entity-type topics --entity-name my-topic \
  --alter --add-config compression.type=snappy

Interaction with batching

Compression works at the batch level, making batching configuration crucial:

# Larger batches = better compression ratios
batch.size=32768          # 32KB batches
linger.ms=10              # Wait up to 10ms to fill batch
compression.type=lz4      # Fast compression for batches

CPU vs network trade-offs

When to prioritize CPU savings

Choose faster algorithms (LZ4, Snappy) when:

CPU resources are limited
High message throughput is required
Network bandwidth is abundant
Low latency is critical

When to prioritize network savings

Choose higher compression (GZIP, ZSTD) when:

Network bandwidth is expensive or limited
Messages are stored for long periods
CPU resources are abundant
Storage costs are a concern

Choosing the right algorithm

Decision matrix

For high-throughput, low-latency scenarios:

compression.type=lz4  # Fast compression, low CPU overhead

For network-constrained environments:

compression.type=gzip  # Maximum compression, reduce network usage

For balanced performance:

compression.type=snappy  # Good compression ratio with reasonable speed

For modern deployments (Kafka 2.1+):

compression.type=zstd  # Excellent compression with good performance

Algorithm selection flowchart

Is network bandwidth limited?
├── Yes → Is CPU abundant?
│   ├── Yes → Use GZIP
│   └── No  → Use ZSTD
└── No  → Is throughput critical?
    ├── Yes → Use LZ4
    └── No  → Use Snappy (balanced choice)

Monitoring compression effectiveness

Key metrics to track

Compression ratio:

compression_ratio = uncompressed_size / compressed_size

Network savings:

network_savings = (1 - compressed_size / uncompressed_size) × 100%

CPU impact: Monitor CPU utilization on producer machines before and after enabling compression.

JMX metrics

kafka.producer:type=producer-metrics,client-id=<client-id>
- compression-rate-avg: Average compression ratio
- record-size-avg: Average record size before compression
- batch-size-avg: Average batch size (after compression)

Best practices

Production recommendations

Start with LZ4: Good balance of speed and compression for most use cases
Test with your data: Compression effectiveness varies by payload type
Monitor CPU usage: Ensure compression doesn’t become a bottleneck
Combine with batching: Larger batches compress better
Consider message format: JSON compresses better than binary formats

Configuration examples

High-throughput producer:

compression.type=lz4
batch.size=65536          # Large batches for better compression
linger.ms=5               # Short linger time for low latency
buffer.memory=67108864    # 64MB buffer for batching

Network-optimized producer:

compression.type=gzip
batch.size=32768          # Reasonable batch size
linger.ms=10              # Allow time for batching
buffer.memory=134217728   # 128MB buffer for larger compressed batches

Balanced producer (recommended starting point):

compression.type=snappy
batch.size=16384          # Default batch size
linger.ms=5               # Low latency
buffer.memory=33554432    # 32MB buffer

Common pitfalls

Mistakes to avoid

Over-compressing small messages: Compression overhead may exceed benefits
Ignoring CPU monitoring: Compression can become a producer bottleneck
Not testing with production data: Compression ratios vary significantly by content
Using GZIP for high-throughput: May create CPU bottlenecks
Forgetting about decompression: Consumers also pay CPU cost for decompression

Troubleshooting compression issues

Poor compression ratios:

Check message format (binary data compresses poorly)
Verify batch sizes are adequate
Consider if data is already compressed

High CPU usage:

Switch to faster algorithm (LZ4 instead of GZIP)
Monitor producer thread CPU utilization
Consider reducing compression level if using ZSTD

Increased latency:

Reduce linger.ms to decrease batching delay
Use faster compression algorithm
Monitor end-to-end message latency

Decompression costRemember that consumers must decompress messages, which also uses CPU. Consider the total system CPU cost, not just producer-side overhead.

Testing recommendationAlways benchmark compression algorithms with your actual production data. Compression effectiveness varies significantly based on message format, size, and content patterns.

Kafkademy

Understanding Kafka

Practicing Kafka

Next level Kafka

Why use compression?

Compression algorithms

GZIP

LZ4

Snappy

ZSTD (Zstandard)

Performance comparison

Compression ratio comparison

Throughput impact

Configuration and tuning

Basic compression configuration

Compression at different levels

Interaction with batching

CPU vs network trade-offs

When to prioritize CPU savings

When to prioritize network savings

Choosing the right algorithm

Decision matrix

Algorithm selection flowchart

Monitoring compression effectiveness

Key metrics to track

JMX metrics

Best practices

Production recommendations

Configuration examples

Common pitfalls

Mistakes to avoid

Troubleshooting compression issues

Kafkademy

Understanding Kafka

Practicing Kafka

Next level Kafka

​Why use compression?

​Compression algorithms

​GZIP

​LZ4

​Snappy

​ZSTD (Zstandard)

​Performance comparison

​Compression ratio comparison

​Throughput impact

​Configuration and tuning

​Basic compression configuration

​Compression at different levels

​Interaction with batching

​CPU vs network trade-offs

​When to prioritize CPU savings

​When to prioritize network savings

​Choosing the right algorithm

​Decision matrix

​Algorithm selection flowchart

​Monitoring compression effectiveness

​Key metrics to track

​JMX metrics

​Best practices

​Production recommendations

​Configuration examples

​Common pitfalls

​Mistakes to avoid

​Troubleshooting compression issues

Why use compression?

Compression algorithms

GZIP

LZ4

Snappy

ZSTD (Zstandard)

Performance comparison

Compression ratio comparison

Throughput impact

Configuration and tuning

Basic compression configuration

Compression at different levels

Interaction with batching

CPU vs network trade-offs

When to prioritize CPU savings

When to prioritize network savings

Choosing the right algorithm

Decision matrix

Algorithm selection flowchart

Monitoring compression effectiveness

Key metrics to track

JMX metrics

Best practices

Production recommendations

Configuration examples

Common pitfalls

Mistakes to avoid

Troubleshooting compression issues