Beyond basic producer settings, advanced configurations provide fine-grained control over producer behavior, performance, and reliability for production deployments.
Buffer memory configuration
Producer buffer memory controls how much memory is available for batching messages before sending to brokers.
buffer.memory
Total memory allocated for producer buffering:
# Default: 33554432 (32MB)
buffer.memory=67108864 # 64MB total buffer
Impact of buffer size:
- Too small: Producer blocks frequently when buffer fills
- Too large: Excessive memory usage, potential GC pressure
- Optimal size: Balance between memory efficiency and producer throughput
max.block.ms
Time to block when buffer is full:
# Default: 60000 (1 minute)
max.block.ms=30000 # Block for 30 seconds maximum
Behavior:
- Producer blocks
send() calls when buffer is exhausted
- After timeout, throws
TimeoutException
- Prevents indefinite blocking in high-load scenarios
Memory calculation
Calculate required buffer memory:
Required buffer = batch.size × active_partitions × safety_factor
Safety factor = 1.5-2.0 (account for traffic spikes)
Example:
- 100 active partitions
- 32KB batch size
- 2x safety factor
= 32KB × 100 × 2 = 6.4MB minimum
Request timeout settings
Timeout configurations control how long producers wait for various operations.
request.timeout.ms
Timeout for individual requests:
# Default: 30000 (30 seconds)
request.timeout.ms=15000 # 15 second request timeout
Applies to:
- Message send requests
- Metadata requests
- Administrative operations
delivery.timeout.ms
Overall timeout for message delivery including retries:
# Default: 120000 (2 minutes)
delivery.timeout.ms=300000 # 5 minute total delivery timeout
Relationship:
delivery.timeout.ms >= request.timeout.ms + (retries x retry.backoff.ms)
How long to cache topic metadata:
# Default: 300000 (5 minutes)
metadata.max.age.ms=60000 # Refresh metadata every minute
Considerations:
- Shorter intervals detect partition changes faster
- Longer intervals reduce metadata requests
- Balance between responsiveness and efficiency
Connection configurations
Producer connection settings affect network behavior and resource usage.
connections.max.idle.ms
Close idle connections after specified time:
# Default: 540000 (9 minutes)
connections.max.idle.ms=300000 # Close idle connections after 5 minutes
Benefits:
- Reduces connection overhead
- Prevents resource leaks
- Balances connection reuse with resource management
max.in.flight.requests.per.connection
Maximum unacknowledged requests per connection:
# Default: 5
max.in.flight.requests.per.connection=3 # More conservative
Impact on ordering:
max.in.flight.requests.per.connection=1: Strict ordering, lower throughput
max.in.flight.requests.per.connection>1: Higher throughput, potential reordering
- With
enable.idempotence=true: Can use up to 5 while maintaining ordering
reconnect.backoff.ms
Wait time before reconnecting failed connections:
# Default: 50
reconnect.backoff.ms=100 # Wait 100ms before reconnect attempts
reconnect.backoff.max.ms
Maximum reconnect backoff time:
# Default: 1000 (1 second)
reconnect.backoff.max.ms=5000 # Cap backoff at 5 seconds
Producer metadata management configurations.
bootstrap.servers
Initial broker list for cluster discovery:
bootstrap.servers=broker1:9092,broker2:9092,broker3:9092
Best practices:
- Include multiple brokers for redundancy
- Use broker DNS names rather than IP addresses
- Include brokers from different racks if possible
Cache metadata when idle:
# Default: 300000 (5 minutes)
metadata.max.idle.ms=180000 # 3 minutes
retry.backoff.ms
Wait time between metadata refresh retries:
# Default: 100
retry.backoff.ms=200 # Wait 200ms between metadata retries
Message size and batching
Advanced message size configurations.
max.request.size
Maximum size of request (batch + headers):
# Default: 1048576 (1MB)
max.request.size=5242880 # 5MB maximum request
Considerations:
- Must be less than or equal to broker’s
message.max.bytes setting
- Affects largest possible batch size
- Consider network MTU and broker memory
send.buffer.bytes
TCP send buffer size:
# Default: 131072 (128KB)
send.buffer.bytes=262144 # 256KB TCP send buffer
receive.buffer.bytes
TCP receive buffer size:
# Default: 65536 (64KB)
receive.buffer.bytes=131072 # 128KB TCP receive buffer
Advanced performance-related configurations.
max.poll.interval.ms
Not directly a producer setting, but affects producer threads:
# Consider producer thread timing when setting consumer max.poll.interval.ms
max.poll.interval.ms=300000 # 5 minutes
socket.connection.setup.timeout.ms
Timeout for socket connections:
# Default: 10000 (10 seconds)
socket.connection.setup.timeout.ms=30000 # 30 second connection timeout
socket.connection.setup.timeout.max.ms
Maximum socket connection timeout with backoff:
# Default: 127000 (127 seconds)
socket.connection.setup.timeout.max.ms=60000 # 1 minute maximum
Monitoring and metrics
Producer monitoring configurations.
metrics.recording.level
Granularity of metrics collection:
# Options: INFO (default), DEBUG, TRACE
metrics.recording.level=INFO
Levels:
- INFO: Basic metrics, low overhead
- DEBUG: Detailed metrics, moderate overhead
- TRACE: All metrics, high overhead (development only)
metrics.sample.window.ms
Metrics sampling window:
# Default: 30000 (30 seconds)
metrics.sample.window.ms=60000 # 1 minute sampling windows
metrics.num.samples
Number of samples to maintain:
# Default: 2
metrics.num.samples=3 # Keep 3 sample windows
metric.reporters
Custom metrics reporters:
metric.reporters=com.example.CustomMetricsReporter
Client identification
Producer identification and naming.
client.id
Unique identifier for this producer:
client.id=order-service-producer-01
Benefits:
- Easier troubleshooting with meaningful names
- Better monitoring and alerting
- Correlation with application logs
client.dns.lookup
DNS resolution strategy:
# Options: default, use_all_dns_ips, resolve_canonical_bootstrap_servers_only
client.dns.lookup=default
Advanced reliability settings
Additional reliability configurations.
enable.idempotence
Enable idempotent producer (prevents duplicates):
# Default: true (Kafka 3.0+)
enable.idempotence=true
When enabled, automatically sets:
retries=Integer.MAX_VALUE
max.in.flight.requests.per.connection=5
acks=all
transaction.timeout.ms
Timeout for transactions (when using transactions):
# Default: 60000 (1 minute)
transaction.timeout.ms=300000 # 5 minute transaction timeout
transactional.id
Unique identifier for transactional producer:
transactional.id=order-service-tx-producer
Configuration templates
High-throughput configuration
# Optimize for maximum throughput
batch.size=65536
linger.ms=20
buffer.memory=134217728
compression.type=lz4
max.in.flight.requests.per.connection=5
request.timeout.ms=30000
delivery.timeout.ms=120000
enable.idempotence=true
Low-latency configuration
# Optimize for minimum latency
batch.size=16384
linger.ms=0
buffer.memory=33554432
compression.type=none
max.in.flight.requests.per.connection=1
request.timeout.ms=10000
delivery.timeout.ms=30000
High-reliability configuration
# Optimize for maximum reliability
acks=all
retries=2147483647
enable.idempotence=true
max.in.flight.requests.per.connection=5
delivery.timeout.ms=600000
request.timeout.ms=30000
buffer.memory=67108864
batch.size=32768
Memory-constrained configuration
# Minimize memory usage
buffer.memory=16777216 # 16MB
batch.size=8192 # 8KB
max.request.size=1048576 # 1MB
connections.max.idle.ms=60000 # 1 minute
Network-optimized configuration
# Optimize for slow/unreliable networks
request.timeout.ms=45000
delivery.timeout.ms=300000
retry.backoff.ms=200
reconnect.backoff.ms=200
reconnect.backoff.max.ms=10000
send.buffer.bytes=262144
receive.buffer.bytes=131072
Environment-specific tuning
Development environment
# Fast feedback, loose reliability
batch.size=16384
linger.ms=0
request.timeout.ms=10000
delivery.timeout.ms=30000
retries=3
buffer.memory=33554432
Testing environment
# Balance speed with some reliability
batch.size=32768
linger.ms=5
request.timeout.ms=15000
delivery.timeout.ms=60000
retries=10
enable.idempotence=true
Production environment
# Full reliability and monitoring
acks=all
retries=2147483647
enable.idempotence=true
delivery.timeout.ms=300000
request.timeout.ms=30000
batch.size=32768
linger.ms=10
buffer.memory=67108864
client.id=myapp-producer-${instance.id}
metrics.recording.level=INFO
Monitoring key configurations
Essential metrics to track
Buffer utilization:
buffer-available-bytes: Available buffer space
buffer-exhausted-rate: Rate of buffer exhaustion events
Request performance:
request-latency-avg: Average request latency
request-rate: Requests per second
response-rate: Responses per second
Connection health:
connection-count: Number of active connections
connection-creation-rate: Rate of new connections
connection-close-rate: Rate of closed connections
Error rates:
record-error-rate: Rate of failed records
record-retry-rate: Rate of retried records
JMX monitoring configuration
# Enable JMX for monitoring
# Add to JVM arguments:
# -Dcom.sun.management.jmxremote
# -Dcom.sun.management.jmxremote.port=9999
# -Dcom.sun.management.jmxremote.authenticate=false
# -Dcom.sun.management.jmxremote.ssl=false
Common configuration mistakes
Pitfalls to avoid
- Insufficient buffer memory: Causes frequent blocking
- Too high linger.ms: Increases latency unnecessarily
- Mismatched timeouts: delivery.timeout.ms too small for retry configuration
- Ignoring idempotence: Missing duplicate prevention in production
- No client.id: Makes troubleshooting difficult
- Wrong max.in.flight.requests: Affects ordering guarantees
Configuration validation
// Validate configuration relationships
Properties props = new Properties();
props.put("delivery.timeout.ms", 120000);
props.put("request.timeout.ms", 30000);
props.put("retries", Integer.MAX_VALUE);
props.put("retry.backoff.ms", 100);
// Ensure delivery timeout is sufficient
int deliveryTimeout = (Integer) props.get("delivery.timeout.ms");
int requestTimeout = (Integer) props.get("request.timeout.ms");
int retryBackoff = (Integer) props.get("retry.backoff.ms");
// Rule: delivery.timeout >= request.timeout + reasonable retry time
assert deliveryTimeout >= requestTimeout + (10 * retryBackoff);
Troubleshooting configurations
High latency issues
# Reduce batching delay
linger.ms=0
# Reduce request timeout
request.timeout.ms=10000
# Ensure no compression delay
compression.type=none
# Reduce in-flight requests
max.in.flight.requests.per.connection=1
Memory pressure
# Reduce buffer size
buffer.memory=16777216
# Smaller batches
batch.size=8192
# Shorter connection idle time
connections.max.idle.ms=60000
# Aggressive blocking timeout
max.block.ms=5000
Connection issues
# Longer connection timeouts
socket.connection.setup.timeout.ms=30000
# Longer reconnect backoff
reconnect.backoff.max.ms=10000
# More aggressive retry
retry.backoff.ms=500
# Reduce connection reuse
connections.max.idle.ms=120000
Configuration compatibilitySome configurations are interdependent. For example, enabling enable.idempotence=true automatically sets retries=Integer.MAX_VALUE, max.in.flight.requests.per.connection=5, and acks=all.
Start simpleBegin with default configurations and adjust one parameter at a time based on monitoring data. Most applications work well with minimal configuration changes.