Kafka data quality — validation rules and policies

Conduktor helps you detect and enforce data quality standards for messages flowing through Kafka.

How it works

You create that define the expected format and content of messages, then attach them to that target specific Kafka topics or topic prefixes. Conduktor evaluates these Rules against every message produced on targeted topics after Policy creation and tracks the count of violating messages as well as the total number of messages evaluated.

Observe vs enforce

The capabilities depend on whether you’re using : Without Gateway (observe only)

Records are verified after they have been produced
Track violations and monitor data quality issues
No impact on message flow

With Gateway (observe and enforce)

Records are verified before production
See which service accounts produced the faulty records
Take action on faulty records:
- Block: reject the message (and the entire record batch). The producer receives a non-retriable error. Find out about handling blocked batches
- Mark: add a header to the record containing all violated rules for every Policy using the mark action
Prevent bad data from entering your topics

Key constraints

Policies cannot mix different target cluster types (Gateway vs. non-Gateway)
You can’t target a Gateway-backing cluster directly to avoid validation conflicts

Data quality metrics

The quality overview dashboard provides a summary of your data quality governance across the entire ecosystem. Use it to:

Track progress: monitor how many topics are protected by Policies
Identify gaps: find high-volume topics without coverage
Measure health: see which producers need attention and how well your Policies are performing. The health score is calculated based on the number of topics that have Policies assigned, with a modifier that determines how effectively they’re enforced (for example, whether violations are blocked).
Take action: click through to add topics to Policies or investigate violations

In Console, access the overview by going to Trust > Data quality overview. You can export all data quality metrics to CSV for offline analysis or reporting.

To see metrics, you need to have at least one topic with a Policy assigned to it.

Coverage

The coverage metrics show how many topics are protected by Policies.

Topics with at least one Policy: percentage of topics with Policy coverage
Topics with multiple Policies: percentage of topics with layered validation
VIP topics at risk: highly utilized topics with no Policy coverage that should be prioritized. VIP topics are those with activity within the last 24 hours, containing over 500 messages or topics that have more than 3 consumers.

Higher coverage means more of your data is being validated. Aim for 100% coverage of VIP topics first. To take action and add a Policy to a topic at risk, click Add to Policy next to the topic name and select an existing Policy. Metrics are updated every 5 minutes by default and changes may not be reflected immediately.

Health

The health metrics show how effectively your Rules are being enforced.

Health score: overall data quality based on coverage and enforcement
Action distribution: breakdown of no action, mark and block actions over time
Top violating producers: producers with the highest violation rates to help prioritize remediation

Mark and block actions and producer identification require Gateway. Without Gateway, violations are reported but not enforced and you will not see information about the producer of violations on non-Gateway topics.

Get started with Conduktor

Concepts

Kafka data quality — validation rules and policies

How it works

Observe vs enforce

Key constraints

Data quality metrics

Coverage

Health

Get started with Conduktor

Concepts

​How it works

​Observe vs enforce

​Key constraints

​Data quality metrics

​Coverage

​Health

​Related resources

How it works

Observe vs enforce

Key constraints

Data quality metrics

Coverage

Health

Related resources