> ## Documentation Index
> Fetch the complete documentation index at: https://docs.conduktor.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Kafka data quality — validation rules and policies

> Define Kafka data quality rules using CEL expressions or JSON Schema. Attach policies to topics to observe violations or enforce blocking with Conduktor.

Conduktor helps you detect and enforce data quality standards for messages flowing through Kafka.

## How it works

You create **<Tooltip tip="A Rule captures business logic that's applied to data. Rules have to be attached to Policies to take effect.">Rules</Tooltip>** that define the expected format and content of messages, then attach them to **<Tooltip tip="A collection of validation Rules that's applied to Kafka topics or prefixes.">Policies</Tooltip>** that target specific Kafka topics or topic prefixes.

Conduktor evaluates these Rules against every message produced on targeted topics after Policy creation and tracks the count of violating messages as well as the total number of messages evaluated.

## Observe vs enforce

The capabilities depend on whether you're using <Tooltip tip="A Kafka proxy that deploys extensible plugins for encryption, filtering, and data processing.">Gateway</Tooltip>:

**Without Gateway (observe only)**

* Records are verified **after** they have been produced
* Track violations and monitor data quality issues
* No impact on message flow

**With Gateway (observe and enforce)**

* Records are verified **before** production
* See which service accounts produced the faulty records
* Take action on faulty records:
  * **Block**: reject the message (and the entire record batch). The producer receives a non-retriable error. [Find out about handling blocked batches](/guide/conduktor-in-production/admin/gateway-policies#handle-blocked-batches)
  * **Mark**: add a header to the record containing all violated rules for every Policy using the mark action
* Prevent bad data from entering your topics

### Key constraints

* Policies cannot mix different target cluster types (Gateway vs. non-Gateway)
* You can't target a Gateway-backing cluster directly to avoid validation conflicts

<Info>
  **From our blog:** [Kafka data contracts: a schema is not a contract](https://conduktor.io/blog/kafka-data-contracts) Why a schema alone won't stop a producer from breaking consumers, and where to enforce the contract.
</Info>

## Data quality metrics

The quality overview dashboard provides a summary of your data quality governance across the entire ecosystem.

Use it to:

* **Track progress**: monitor how many topics are protected by Policies
* **Identify gaps**: find high-volume topics without coverage
* **Measure health**: see which producers need attention and how well your Policies are performing. The health score is calculated based on the number of topics that have Policies assigned, with a modifier that determines how effectively they're enforced (for example, whether violations are blocked).
* **Take action**: click through to add topics to Policies or investigate violations

In Console, access the overview by going to **Trust > Data quality overview**. You can export all data quality metrics to CSV for offline analysis or reporting.

<img src="https://mintcdn.com/conduktor/gulAsLZPUMqXFfoy/images/data-quality-overview.png?fit=max&auto=format&n=gulAsLZPUMqXFfoy&q=85&s=3e4bb065bdc293e93631d2df43a967ef" alt="Data Quality Overview" width="3662" height="2254" data-path="images/data-quality-overview.png" />

<Info>
  To see metrics, you need to have at least one topic with a Policy assigned to it.
</Info>

### Coverage

The coverage metrics show how many topics are protected by Policies.

* **Topics with at least one Policy**: percentage of topics with Policy coverage
* **Topics with multiple Policies**: percentage of topics with layered validation
* **VIP topics at risk**: highly utilized topics with no Policy coverage that should be prioritized.
  VIP topics are those with activity within the last 24 hours, containing over 500 messages or topics that have more than 3 consumers.

Higher coverage means more of your data is being validated. Aim for 100% coverage of VIP topics first.

To take action and add a Policy to a topic at risk, click **Add to Policy** next to the topic name and select an existing Policy. Metrics are updated every 5 minutes by default and changes may not be reflected immediately.

### Health

The health metrics show how effectively your Rules are being enforced.

* **Health score**: overall data quality based on coverage and enforcement
* **Action distribution**: breakdown of no action, mark and block actions over time
* **Top violating producers**: producers with the highest violation rates to help prioritize remediation

<Note>
  Mark and block actions and producer identification require Gateway. Without Gateway, violations are reported but not enforced and you will not see information about the producer of violations on non-Gateway topics.
</Note>

## Related resources

* [Observe data quality](/guide/use-cases/observe-data-quality)
* [Enforce data quality](/guide/use-cases/enforce-data-quality)
* [Give us feedback/request a feature](https://conduktor.io/roadmap)
