Preview functionality: Insights is currently a preview feature and is subject to change as we continue working on it.
Overview
Risk analysis monitors three critical aspects of topic configuration:- Replication factor - topics with insufficient data redundancy
- Partition distribution - topics with sub-optimal partition allocation across brokers
- Partition skew - topics with uneven data distribution across partitions
- Red: critical risk requiring immediate attention
- Orange: moderate risk that should be addressed
- Green: healthy configuration
Replication factor
The replication factor graph displays topics organized by their replica count (1, 2, 3, etc.). Individual topics are listed below with their current RF settings. Low replication factors increase the risk of data loss if a broker fails. RF = 1 means no data redundancy. If the broker hosting that topic fails, all data becomes unavailable permanently. RF = 3 is recommended for production environments. This provides the right balance between data safety and storage overhead, tolerating one broker failure without data loss.- Red (RF = 1): critical risk - no fault tolerance
- Orange (RF = 2): moderate risk - can tolerate only one broker failure
- Green (RF = 3+): adequate fault tolerance
How to resolve replication
Kafka does not allow changing the replication factor of an existing topic through configuration updates.You have to use partition reassignment to add replicas or recreate the topic with the desired replication factor.
1
Navigate to the topic
Navigate to Topics from the main menu and select the topic shown in the Risk Analysis dashboard.
2
Review current configuration
Click the Configuration tab and note the replication factor shown at the top of the page to confirm the current value.Replication factor is a topic-level setting that cannot be changed after topic creation through normal configuration updates.
- Partition reassignment
- Recreate topic
Partition reassignment allows you to add replicas to existing topics without recreating them. This operation requires Kafka administrative tools external to Console:Works without downtime and preserves existing data. Best for production environments and topics with significant data.
1
Document current partition assignment
In Console, navigate to the topic and click the Partitions tab. Document the current replica assignments for all partitions.
2
Perform partition reassignment
Use Kafka administrative tools (such as
kafka-reassign-partitions) to add additional replicas to the topic. This process replicates data across additional brokers in the background.Partition reassignment requires creating a JSON file specifying new replica assignments and executing the reassignment using Kafka CLI tools. Set throttling limits to avoid impacting cluster performance during the operation.
3
Verify completion in Console
Return to the Partitions tab in Console and verify all partitions now show the increased replication factor.
default.replication.factor=3 in broker configuration and configure min.insync.replicas=2 to ensure writes are acknowledged by at least two replicas.
Use RBAC permissions to prevent users from creating topics with RF < 3.
Partition distribution
What it shows
The partition distribution graph displays topics grouped by partition count (1, 3, 4, 5, 6, 8, 10, 12, etc.). Topics are listed below with their total partition count. Viewing the full list will also show the replication factor and partition skew.Why it matters
Uneven partition distribution creates hotspots where some brokers handle disproportionate load, leading to performance bottlenecks and reduced fault tolerance.How to interpret
Optimal distribution spreads partitions evenly across all brokers with balanced leadership. Warning signs include partitions concentrated on specific brokers or partition counts that aren’t multiples of broker count (for example, 7 partitions on a 3-broker cluster).How to resolve this
Analyze current distribution:1
Navigate to the topic
Go to Topics and select the affected topic from the Risk Analysis dashboard.
2
Review partition distribution
Click the Partitions tab and examine the distribution across brokers.
3
Switch views
Toggle between Per partition and Per broker views to understand the distribution pattern.Per broker view shows:
- Which brokers lead which partitions
- Which brokers hold follower replicas
- Imbalances in partition leadership
4
Identify rebalancing needs
Look for:
- Brokers with significantly more leader partitions than others
- Brokers with no partitions for critical topics
- Uneven distribution patterns that could cause hotspots
- Rebalance leadership
- Reassign replicas
- Add partitions
Use Kafka administrative tools to trigger preferred leader election, which reassigns leadership to each partition’s preferred leader without moving data.This lightweight operation is safe for production and should be run regularly.
Preferred leader election only changes which broker is the leader for each partition. It does not move data or change replica assignments.
Partition skew
What it shows
The partition skew graph displays topics grouped by skew ratio ranges:- < 0.25: Slight imbalance (green)
- 0.25 - 0.75: Moderate imbalance (orange)
- > 0.75: Severe imbalance (red)
Why it matters
High partition skew causes performance problems (hot partitions, consumer lag), resource inefficiency (wasted parallelism, uneven disk usage), and may indicate poor partition key selection or producer misconfiguration.How to interpret
The skew ratio compares the largest partition to the smallest:- < 0.25 (green): Acceptable variation
- 0.25 - 0.75 (orange): Monitor and investigate
- > 0.75 (red): Immediate attention required
Root causes
Poor partition key selection - Keys with uneven distribution, too few unique keys, or clustering around certain values. Producer configuration issues - Manual partition assignment, custom partitioner with flawed logic, or null keys. Data model problems - Business logic creating natural hotspots, temporal patterns, or geographic clustering.How to resolve this
Diagnose the skew:1
Navigate to the topic
Go to Topics and select the affected topic from the Risk Analysis dashboard.
2
Review partition details
Click the Partitions tab and select the Per partition view.
3
Identify imbalanced partitions
Compare the following columns across all partitions:
- Total number of records - Shows message count per partition
- Partition size - Shows disk space consumed
- Begin offset and End offset - Shows the range of messages
4
Document the pattern
Note which partitions are oversized and by how much. This will help identify the root cause.
1
Navigate to the Consume tab
Click the Consume tab for the topic.
2
Configure consumer settings
Configure the consumer to read from All partitions to see the full data distribution.
3
Review message keys
Examine the keys in the consumed data. Look for patterns:
- Are certain keys appearing far more frequently than others?
- Are many messages using null keys?
- Is there visible clustering in key values?
4
Filter by partition
Use the partition filter to consume from specific partitions (especially the largest and smallest) to compare key distributions.
- Fix partition key (recommended)
- Increase partitions
- Recreate topic
Choose keys with high cardinality and even distribution:Good choices:
- User ID, Order ID, Transaction ID, Device ID
- Composite keys like
${region}-${customerId} - Any identifier with naturally even distribution
- Status fields (limited values)
- Boolean values (only two values)
- Small enums (limited set of values)
- Dates without time component
- Null keys
Changing the partition key requires updating producer applications. Coordinate with your development team to implement the new key strategy.
Set up alerts for partition size differences and review the Risk Analysis dashboard regularly. Monitor consumer lag by partition to identify performance impacts.
Troubleshooting
Why does my topic show high skew even with good partition keys?
Why does my topic show high skew even with good partition keys?
Several factors can cause skew even with well-designed partition keys:
- Time-based patterns: Temporal clustering (business hours vs. night) creates natural skew based on when data was produced
- Compaction: Log compacted topics retain more messages in partitions with higher key diversity
- Retention: Uneven produce rates over time mean partitions contain data from different periods
- Producer failures: Restarts or errors may temporarily cluster messages on specific partitions
- Natural data distribution: Some business scenarios naturally create skew (one customer generating 80% of orders)
Can I fix replication factor without recreating the topic?
Can I fix replication factor without recreating the topic?
Yes, use partition reassignment to add replicas to existing topics through Kafka administrative tools:
- View current replica assignments in Console’s Partitions tab
- Create a reassignment plan specifying new replica assignments with additional broker IDs
- Execute the reassignment using Kafka CLI tools (data replicates in the background)
- Monitor progress and verify completion in Console
How many partitions should my topic have?
How many partitions should my topic have?
Key considerations:
- Throughput: More partitions = more parallelism and higher potential throughput
- Consumer count: You need at least as many partitions as consumers for full parallelism
- Broker count: Choose a partition count that’s a multiple of broker count for even distribution
- Message ordering: Ordering is only guaranteed within a single partition
- Overhead: Each partition adds metadata overhead. Tens of thousands of partitions can cause performance issues
max(# of consumers × 2, # of brokers × 2) and adjust based on monitoring.Examples:- 6-broker cluster with 10 consumers: 20 partitions
- 6-broker cluster with 3 consumers: 12 partitions
- 12-broker cluster with 50 consumers: 100 partitions
Why do I see partition skew immediately after topic creation?
Why do I see partition skew immediately after topic creation?
This is normal and expected for newly created topics. Initial messages create skew as partitions receive different amounts of data before distribution stabilizes.Expected behavior:
- First 100-1000 messages: High skew is normal
- After 1000+ messages: Skew should normalize if partition keys are well-distributed
- After 24-48 hours: Skew ratios should stabilize
What's the performance impact of partition reassignment?
What's the performance impact of partition reassignment?
Partition reassignment impacts network traffic, disk I/O, client latency, and broker CPU. Duration depends on data volume: small topics (< 1 GB) complete in minutes; large topics (> 1 TB) can take hours or days.Best practices to minimize impact:
- Schedule during low-traffic periods
- Use throttling to prevent saturating network bandwidth
- Monitor cluster metrics (CPU, disk I/O, network throughput, client latency) using Console
- For very large topics, reassign partitions in batches
- Adjust throttle dynamically based on traffic patterns
- Remove throttle after completion
Always set a throttle value when performing partition reassignment in production. Un-throttled reassignment can impact client operations and cause outages.