Skip to main content
Quick navigation

Data quality

Overview

Bad data breaks customer experiences, drives churn and slows growth. Conduktor Trust helps teams catch and fix data quality issues before they impact your business. You define the rules and we'll enforce them at the streaming layer.

Prerequisites

Before creating data quality rules and policies, you have to:

  • use Conduktor Console 1.34 or later
  • use Conduktor Gateway 3.9 or later
  • be logged in as an admin to Console UI (or use an admin token for the CLI)
  • in Console, configure your Gateway cluster and fill in the Provider tab with Gateway API credentials

Rules

You can create Rules with CEL expressions which capture business logic for your data. For example: value.customerId.matches("[0-9]{8}").

Rules require Policies

Rules do nothing on their own - you have to to attach them to a Policy.

The Rules page lists your Rules, with a preview of their CEL expressions. Open Rule's detail page to see its description, full CEL expression and attached Policies.

Create a Rule

You can create a data quality rule from the Console UI, or the Conduktor CLI.

To create a Rule using the Console UI:

  1. In the Trust section of the sidebar of Conduktor Console go to Rules and click +New Rule.
  2. Define the Rule details:
    • Add a descriptive name for the Rule.
    • The Technical ID will be auto-populated as you type in the name. This is used to identify this Rule in CLI/API.
    • (Optional) Enter a Description to explain your Rule.
  3. Define the CEL expression for your Rule:
    • Common Expression Language (CEL) is an expression language supporting common operators like == and > as well as macros like has() to check for the presence of fields. See the CEL language definition for more details.
  4. Click Create.

Example Rules

Here are some sample data quality rules.

Amend values if using these samples

Make sure you amend the field values to use correct fields, if using these examples.

Email RegEx validation

Your requirements may be different from this RegEx, as email validation via RegEx is complex! value.customer.email.matches(r"^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$")

UUID RegEx validation

value.customer.id.matches(r"^[0-9a-fA-F]{8}\b-[0-9a-fA-F]{4}\b-[0-9a-fA-F]{4}\b-[0-9a-fA-F]{4}\b-[0-9a-fA-F]{12}$")

Ensure Header is JSON

headers['Content-Type'] == 'application/json'

Policies

A Policy is made up of Rules that are applied to topics/prefixes.

Once created, Policies can be assigned actions to take effect when certain criteria is met (e.g., a Rule in the Policy is violated).

The Policy's detail page shows its description, linked Rules, targets, the number of messages evaluated and the number of violations since the Policy was created. You can also enable (and disable) actions for the Policy from this page.

The Policies page lists your Policies with their actions and targets.

Actions

The available actions to enable for a Policy are:

  • Report: when violations occur, log these as events in the Policy's history
  • Block: when a violation occurs, prevent data from being processed or transmitted

By default, Policies created using the Console UI don't have any actions enabled. You have to complete the Policy creation first and then enable the required actions. If there are any additional actions you'd like to see, please get in touch.

Create a Policy

You can create a data quality policy from the Console UI, or the Conduktor CLI.

You can create a Policy through the Console UI through the following steps:

  1. In the Trust section of the sidebar in Conduktor Console go to Policies and click +New Policy.
  2. Define the Policy details:
    • Add a descriptive name for the Policy.
    • The Technical ID will be auto-populated as you type in the name. This is used to identify this Policy in CLI/API.
    • (Optional) Enter a Description to explain your Rule.
  3. Select Rules to be used in the Policy:
    • Every Policy must have at least one Rule
    • You can also create new Rules from this page
    • Click Continue to move to the next step.
  4. Select targets for the Policy:
    • Every Policy must have at least one target
    • A target consists of one or more topics on a specified Gateway
    • You can either select specific topics, or specify a prefix like orders-*
    • Click Continue to move to the next step.
  5. Review the policy, and if you are happy, confirm by clicking Create.

Manage a Policy

Once a Policy is created, you are able to view the linked Rule(s), the target(s) of the Policy and change the actions of the Policy. You can also view the violations as they have occurred if you have reporting enabled, otherwise you will only have the counts available.

Enabling block action

Since the block action has the ability to stop data from being sent to the requested topic, you have to confirm this by entering 'BLOCK' when prompted. Conversely, to disable the blocking, enter 'UNBLOCK' when prompted.

Troubleshoot

What does Policy status mean?

This is the status of a data quality policy:

  • Pending: the configuration isn't deployed or refreshed yet
  • Ready: the configuration is up-to-date on Gateway
  • Failed: something unexpected happened during the deployment. Check that the connected Gateway is active.

How do I handle headers with dashes?

Use bracket notation instead of dot notation. For example, use the headers['Content-Type'] format.