Traditional rebalancing problems
Eager rebalancing (pre-Kafka 2.4)
- Stop-the-world: All consumers stop processing during rebalance
- Complete reassignment: All partitions are revoked and reassigned
- Processing downtime: No messages processed during rebalance period
- Cascading rebalances: One consumer failure affects entire group
Performance impact
Incremental cooperative rebalancing
How it works (Kafka 2.4+)
- Minimal disruption: Only affected partitions are reassigned
- Continued processing: Unaffected partitions continue processing
- Gradual transition: Rebalance happens in multiple phases
- Reduced downtime: Significantly shorter processing interruptions
Rebalancing phases
Configuration
Static group membership
Concept
Static group membership allows consumers to maintain stable identities across restarts, preventing unnecessary rebalances during planned maintenance or brief outages.Benefits
- Fewer rebalances: Consumer restarts don’t trigger rebalances
- Stable assignments: Partitions stay with the same consumer instance
- Faster recovery: Consumers can resume processing from where they left off
- Operational efficiency: Planned maintenance doesn’t disrupt other consumers
Configuration
Consumer lifecycle
Use cases and benefits
High-availability applications
Containerized environments
Stream processing applications
- State preservation: Local state stores remain associated with specific consumers
- Reduced reprocessing: Avoid recomputing state after rebalances
- Consistent partitioning: Same consumer always processes same partitions
Monitoring and observability
Key metrics
- Rebalance frequency: Number of rebalances per time period
- Rebalance duration: Time taken for rebalance completion
- Partition assignment stability: How often partitions change owners
- Consumer lag during rebalance: Processing delay during rebalances
JMX metrics
Configuration best practices
For incremental rebalancing
For static group membership
Combined configuration
Operational considerations
Deployment strategies
- Rolling updates: Use static group membership for zero-downtime deployments
- Blue-green: Static IDs help maintain partition assignments
- Canary releases: Incremental rebalancing minimizes impact on stable consumers
Maintenance windows
Troubleshooting
Common issues and solutions:- Duplicate static IDs: Ensure unique
group.instance.id
per consumer - Long session timeouts: Balance between stability and failure detection
- Assignment strategy conflicts: Ensure all consumers use compatible assignors
Migration strategyWhen migrating to incremental rebalancing and static membership:
- Start with incremental rebalancing first
- Monitor rebalance behavior and performance
- Gradually introduce static group membership
- Test failure scenarios thoroughly
Static member considerations
- Static members that don’t restart within
session.timeout.ms
will be removed from the group - Ensure unique
group.instance.id
values to avoid conflicts - Plan for scaling scenarios where static IDs need management
Performance impact
Before (eager rebalancing)
After (incremental + static)
Measurable improvements
- 90% reduction in processing downtime during rebalances
- 50% fewer unnecessary rebalances with static membership
- Improved throughput due to reduced processing interruptions
- Better consumer utilization with sticky partition assignments