Why remove ZooKeeper from Kafka?
Kafka scaling has hit a performance bottleneck with Zookeeper, which means Kafka has the following limitations with Zookeeper:- Kafka clusters only support a limited number of partitions (up to 200,000)
- When a Kafka broker joins or leaves a cluster, a high number of leader election must happen which can overload Zookeeper and slow down the cluster temporarily
- Kafka clusters setup is difficult and depends on another component to setup
- Kafka cluster metadata is sometimes out-of-sync from Zookeeper
- ZooKeeper security is lagging behind Kafka security
Kafka KRaft mode
It has been noted as part of KIP-500 that the metadata of Kafka itself is a log and that Kafka brokers should be able to consume that metadata log as an internal metadata topic. Kafka leverages itself! Removing ZooKeeper means that Kafka must still act as a quorum to perform controller election and therefore the Kafka brokers implement the Raft protocol thus giving the name KRaft to the new Kafka Metadata Quorum mode.
- Ability to scale to millions of partitions, easier to maintain and set up
- Improved stability, easier to monitor, support, and administer
- Single process to start Kafka
- Single security model for the whole system
- Faster controller shutdown and recovery time