Apache Kafka has default message size limits that can be configured to handle larger payloads, but there are important considerations and best practices to follow.

Default message size limits

By default, Kafka has the following message size limits:
  • Producer: 1MB (max.request.size)
  • Broker: 1MB (message.max.bytes)
  • Topic: Inherits from broker setting (max.message.bytes)
  • Consumer: 1MB (max.partition.fetch.bytes)

Configuring Kafka for large messages

To send messages larger than 1MB, you need to configure multiple components:

Producer configuration

# Set maximum request size for producer
max.request.size=10485760  # 10MB
# Increase buffer memory if needed
buffer.memory=67108864     # 64MB

Broker configuration

# Set maximum message size for broker
message.max.bytes=10485760          # 10MB
# Set maximum replica fetch size
replica.fetch.max.bytes=10485760    # 10MB
# Set maximum response size
socket.receive.buffer.bytes=1048576 # 1MB
socket.send.buffer.bytes=1048576    # 1MB

Topic configuration

kafka-configs --bootstrap-server localhost:9092 \
  --alter --entity-type topics --entity-name large-topic \
  --add-config max.message.bytes=10485760

Consumer configuration

# Set maximum fetch size for consumer
max.partition.fetch.bytes=10485760  # 10MB
fetch.max.bytes=52428800            # 50MB

Performance implications

Sending large messages in Kafka has several performance implications:

Memory usage

  • Larger messages consume more memory on brokers, producers, and consumers
  • Can lead to increased garbage collection pressure
  • May require tuning JVM heap sizes

Network bandwidth

  • Large messages consume more network bandwidth
  • Can lead to network congestion and timeouts
  • May require adjusting network buffer sizes

Disk I/O

  • Larger messages result in more disk I/O operations
  • Can impact log compaction performance
  • May require faster storage systems

Throughput impact

  • Large messages generally reduce overall throughput
  • Kafka is optimized for high-throughput, small messages
  • Consider message batching strategies

Alternative approaches

Instead of sending large messages directly, consider these alternatives:

1. External storage pattern

Store large payloads in external systems and send only references:
{
  "id": "message-123",
  "timestamp": "2023-01-01T00:00:00Z",
  "data_location": "s3://bucket/path/to/large-file.json",
  "metadata": {
    "size": 50000000,
    "checksum": "abc123"
  }
}
Benefits:
  • Keeps Kafka messages small and fast
  • Allows for separate scaling of storage and messaging
  • Enables efficient caching strategies

2. Message splitting

Break large messages into smaller chunks:
{
  "message_id": "msg-123",
  "chunk_id": "chunk-1",
  "total_chunks": 5,
  "chunk_data": "...",
  "sequence": 1
}
Benefits:
  • Works within default Kafka limits
  • Allows for parallel processing
  • Provides better error recovery

3. Compression

Enable compression to reduce message sizes:
# Producer compression
compression.type=snappy  # or gzip, lz4, zstd
Benefits:
  • Reduces network bandwidth usage
  • Decreases storage requirements
  • Often improves throughput

Best practices

Recommendations for large messages
  1. Avoid large messages when possible - Kafka is optimized for small, high-throughput messages
  2. Use external storage - Store large payloads externally and reference them in Kafka messages
  3. Enable compression - Always enable compression for large messages
  4. Monitor memory usage - Ensure adequate heap sizing for all components
  5. Test thoroughly - Verify performance impact in your specific environment

Configuration checklist

When configuring for large messages, ensure all these settings are aligned:
  • ✅ Producer max.request.size
  • ✅ Broker message.max.bytes
  • ✅ Topic max.message.bytes
  • ✅ Consumer max.partition.fetch.bytes
  • ✅ Consumer fetch.max.bytes
  • ✅ Broker replica.fetch.max.bytes

Monitoring considerations

Monitor these metrics when working with large messages:
  • Memory usage on brokers, producers, and consumers
  • Network bandwidth utilization
  • Disk I/O patterns and latency
  • Garbage collection frequency and duration
  • Message throughput and latency
Performance impactLarge messages can significantly impact Kafka performance. Always test in a staging environment that mirrors your production setup before deploying large message configurations.