Overview
The VIP Topics section displays a health overview graph showing topics identified as important to your infrastructure. VIP topics are determined by two key metrics:- Consumer group count - Topics with many subscribing consumer groups (shown by bar height)
- Message volume - Topics with high throughput and data volume (shown by color intensity)
What the graph shows
VIP topics health overview
The bar graph visualizes your most important topics using two dimensions: Bar height represents the number of consumer groups subscribed to each topic:- Taller bars indicate more consumer groups depend on the topic
- Many consumers suggest the topic provides data critical to multiple applications
- Wide usage indicates potential for widespread impact if issues occur
- Darker blue indicates higher message volume and throughput
- Lighter blue indicates lower message volume
- Message volume combined with consumer count identifies truly critical topics
Interpreting the visualization
Tall, dark bars - Topics with many consumers and high message volume are your most business-critical data pipelines requiring highest priority for monitoring and careful change management. Tall, light bars - Topics with many consumers but lower message volume may represent configuration or control topics that still require careful management despite lower throughput. Short, dark bars - Topics with high message volume but fewer consumers represent specialized high-throughput pipelines requiring performance optimization and capacity planning.Why VIP topics matter
VIP topics represent the backbone of your data infrastructure. Multiple applications and consumer groups depend on these topics for critical business processes. Issues with VIP topics have cascading impact across downstream applications and services. High traffic makes VIP topics susceptible to performance problems, while configuration issues create data loss risk. Changes to VIP topics require coordination across teams and careful testing. Poor replication factors, insufficient partitions or inadequate retention can cause widespread application failures.Actions to take in Console
For each VIP topic identified in the dashboard, verify and optimize configurations across multiple areas. Expand each section to review specific actions and guidance.Review topic configuration
Review topic configuration
Navigate to the topic
Open the Configuration tab
Verify critical settings
- Has to be at least 3 for production VIP topics
- Provides fault tolerance for up to one broker failure
- Ensures data durability and availability
retention.ms- Time-based retention appropriate for business needsretention.bytes- Size-based retention per partition if applicable- Consider longer retention for VIP topics to support late-arriving consumers
- Sufficient partitions for current and projected throughput
- Ideally a multiple of broker count for even distribution
- Adequate parallelism for all consumer groups
delete- For time-series or event datacompact- For state or changelog topics- Appropriate for the data model and consumption patterns
Set up monitoring and alerts
Set up monitoring and alerts
Navigate to the VIP topic
Configure consumer lag alerts
- Define acceptable lag limits based on business requirements
- Use stricter thresholds for VIP topics than standard topics
- Alert on both absolute lag (message count) and time-based lag
Create under-replicated partition alerts
- Alert immediately if any partitions become under-replicated
- Under-replicated partitions indicate broker issues or failures
- Critical for VIP topics where data loss risk is unacceptable
Set disk usage alerts
- Alert on rapid growth that could cause disk space issues
- Track retention effectiveness
- Plan capacity expansions before reaching limits
Configure throughput alerts
- Alert on sudden drops in produce rate (possible producer failure)
- Alert on unexpected spikes that could cause performance issues
- Baseline normal throughput to detect anomalies
Establish ownership and governance
Establish ownership and governance
- Automatic ownership tracking - applications define owners and business context at topic creation
- Clear accountability - Console automatically assigns ownership to application teams
- Governance enforcement - policies ensure VIP topics meet configuration standards (RF=3, appropriate retention)
- Business context preserved - application definitions maintain documentation about purpose, dependencies and SLAs
- When issues occur, the right teams are contacted immediately through defined ownership
- Configuration changes follow approval workflows specific to business-critical topics
- Governance policies prevent VIP topics from being created with suboptimal settings
- Topic purpose and dependencies are documented in application definitions
Review access controls and security
Review access controls and security
Verify RBAC permissions
- Limit producer permissions to authorized applications only
- Restrict consumer access to approved teams and services
- Require elevated permissions for configuration changes
- Audit permissions regularly for VIP topics
Review security policies
- Verify encryption in transit (SSL/TLS) is enforced
- Confirm ACLs or RBAC rules restrict access appropriately
- Check for data masking or encryption requirements
- Ensure compliance with organizational security policies
Monitor consumer health
Monitor consumer health
Navigate to the VIP topic
Review consumer group metrics
- Lag - Current lag per partition and total lag
- State - Active consumers or empty groups
- Members - Number of active consumer instances
- Commit frequency - How often consumers commit offsets
Identify problematic consumers
- Consistently high or growing lag
- Consumers that frequently rebalance
- Groups with zero active members but uncommitted messages
- Uneven lag distribution across partitions
Review partition distribution
Review partition distribution
Navigate to the VIP topic
Analyze partition distribution
- Partitions are evenly distributed across all brokers
- Leadership is balanced (no single broker leads most partitions)
- No brokers are excluded from the topic
- Replica assignments provide proper fault tolerance
Check for partition skew
- Partition sizes across all partitions
- Message counts per partition
- Offset ranges (begin offset to end offset)
Track performance metrics
Track performance metrics
Navigate to the VIP topic
Review produce metrics
- Messages in per second - Produce rate over time
- Bytes in per second - Data volume throughput
- Look for unusual spikes, drops or patterns
- Establish baseline performance for capacity planning
Review consume metrics
- Messages out per second - Consume rate across all consumer groups
- Bytes out per second - Data volume being consumed
- Compare consume rate to produce rate to identify accumulation
Identify trends and anomalies
- Detect gradual throughput increases requiring capacity planning
- Identify time-of-day or day-of-week patterns
- Spot sudden changes that may indicate application issues
- Plan for peak traffic periods and scaling needs
Troubleshooting
Why does a low-volume topic appear as a VIP topic?
Why does a low-volume topic appear as a VIP topic?
Should all VIP topics have the same configuration?
Should all VIP topics have the same configuration?
What if VIP topic health score is poor?
What if VIP topic health score is poor?
- Identify issues - Check All recommendations in VIP topics section and review the Risk analysis section
- Prioritize by impact - Critical (replication factor less than 3), High (under-replicated partitions), Medium (partition skew), Low (suboptimal configurations)
- Take action - For replication issues, follow replication remediation steps. For partition problems, follow partition troubleshooting steps. For performance issues, review consumer lag and throughput metrics
- Verify improvement - Monitor the health score after remediation to confirm resolution