Skip to main content
Learn how to replicate data across Kafka clusters in 15 minutes Multi-cluster deployments are common for enterprises with global presence, disaster recovery requirements, or data locality needs. This guide covers cross-cluster replication architectures and tools. What you’ll learn:
  • When you need multiple Kafka clusters
  • Cross-cluster mirroring tools and options
  • Active-active vs active-passive architectures
  • Trade-offs and considerations for each approach

Why multiple clusters?

Enterprises commonly need multiple Kafka clusters for:
  • Geographic data locality (reduce latency)
  • Disaster recovery (business continuity)
  • Regulatory compliance (data residency)
  • Environment isolation (dev/staging/prod)

Cross-cluster mirroring

Mirroring tools consume data from one cluster and produce to another. The mechanism is simple: a consumer + producer.
ToolProviderNotes
MirrorMaker 2Apache KafkaShips with Kafka, Kafka Connect based
Confluent ReplicatorConfluentCommercial, additional features
uReplicatorUberOpen source, performance optimized
Custom FlinkNetflixCustom implementation

Active-active architecture

This architecture is used when two or more data centers share some or all of the data and each data center is able to both produce and consume events. Producers produce to the same topic in both the clusters, and mirroring occurs between two topics in the clusters.

Advantages

The advantages of this architecture are:
  • Ability to serve users from a nearby data center, which typically has performance benefits
  • Redundancy and resilience. Since every data center has all the functionality, if one data center is unavailable you can direct users to a remaining data center.

Disadvantages

The main drawback of this architecture is the challenges in avoiding conflicts when data is read and updated asynchronously in multiple locations. Also, you need to handle where to read from using your consumers (usually using timestamps) as offsets are not necessarily synchronized across clusters (it depends on what you’re using for the replication mechanism).

Active-passive architecture

In some cases, the only requirement for multiple clusters is to support some kind of disaster scenario or to enable faster reads locally by mirroring an entire cluster. Perhaps you have two clusters in the same data center. You use one cluster for all the applications, but you want a second cluster that contains (almost) all the events in the original cluster that you can use if the original cluster is completely unavailable. In this architecture, producers publish data to the active cluster only. The passive cluster receives no writes, it just receives the mirrored data from the active cluster by Mirror Maker.

Advantages

The advantages of this architecture are:
  • Simplicity in setup and the fact that it can be used in pretty much any use case
  • No need to worry about access to data, handling conflicts, and other architectural complexities.

Disadvantages

The disadvantages are the waste of a good cluster and the fact that it is currently not possible to perform cluster failover in Kafka without either losing data or having duplicate events.
Offsets aren’t always preservedReplicating may not necessarily preserve offsets, just data! Data at an offset in one cluster may not be the same as the data at the same offset in another cluster.
As of Mirror Maker 2 in Kafka 2.7, you can use the setting sync.group.offsets.enabled

Architecture comparison

AspectActive-ActiveActive-Passive
ComplexityHighLow
Data availabilityBoth clustersPrimary only
FailoverInstantManual switchover
Conflict handlingRequiredNot needed
Resource usageHigherLower (standby idle)
See it in practice with ConduktorConduktor Console can connect to multiple Kafka clusters simultaneously. Monitor replication lag, compare topic configurations, and manage all your clusters from a single interface.

Further reading

Next steps