Skip to main content
Quick navigation

What is SQL Data quality producer?

Use sql definition to assert data quality before being produced.

View the full demo in realtime

You can either follow all the steps manually, or watch the recording

Review the docker compose environment

As can be seen from docker-compose.yaml the demo environment consists of the following services:

  • gateway1
  • gateway2
  • kafka-client
  • kafka1
  • kafka2
  • kafka3
  • schema-registry
cat docker-compose.yaml

Starting the docker environment

Start all your docker processes, wait for them to be up and ready, then run in background

  • --wait: Wait for services to be running|healthy. Implies detached mode.
  • --detach: Detached mode: Run containers in the background
docker compose up --detach --wait

Creating topic cars on gateway1

Creating on gateway1:

  • Topic cars with partitions:1 and replication-factor:1
kafka-topics \
--bootstrap-server localhost:6969 \
--replication-factor 1 \
--partitions 1 \
--create --if-not-exists \
--topic cars

Adding interceptor cars-quality

Let's create an interceptor to ensure the data produced is valid.

step-06-cars-quality-interceptor.json:

{
"kind" : "Interceptor",
"apiVersion" : "gateway/v2",
"metadata" : {
"name" : "cars-quality"
},
"spec" : {
"comment" : "Adding interceptor: cars-quality",
"pluginClass" : "io.conduktor.gateway.interceptor.safeguard.DataQualityProducerPlugin",
"priority" : 100,
"config" : {
"statement" : "SELECT * FROM cars WHERE color = 'red' and record.key.year > 2020",
"action" : "BLOCK_WHOLE_BATCH",
"deadLetterTopic" : "dead-letter-topic"
}
}
}
curl \
--silent \
--request PUT "http://localhost:8888/gateway/v2/interceptor" \
--header "Content-Type: application/json" \
--user "admin:conduktor" \
--data @step-06-cars-quality-interceptor.json | jq

Producing an invalid car

Produce invalid record to the cars topic (record is not produced because color is not red)

Sending 1 event

{
"type" : "SUV",
"price" : 2000,
"color" : "blue"
}
echo '{"type":"SUV","price":2000,"color":"blue"}' | \
kafka-console-producer \
--bootstrap-server localhost:6969 \
--topic cars

[!IMPORTANT] We get the following exception

org.apache.kafka.common.errors.PolicyViolationException:
> Request parameters do not satisfy the configured policy: Data quality policy is violated.

Producing an invalid car based on key

Produce invalid record to the cars topic (record is not produced because year is not > 2020)

Sending 1 event

{
"key" : "{\"year\":2010,\"make\":\"BMW\"}",
"value" : {
"type" : "Sports",
"price" : 1000,
"color" : "red"
}
}
echo '{"year":2010,"make":"BMW"}\t{"type":"Sports","price":1000,"color":"red"}' | \
kafka-console-producer \
--bootstrap-server localhost:6969 \
--property "parse.key=true" \
--topic cars

[!IMPORTANT] We get the following exception

org.apache.kafka.common.errors.PolicyViolationException:
> Request parameters do not satisfy the configured policy: Data quality policy is violated.

Producing a valid car

Produce valid record to the cars topic

Sending 1 event

{
"headers" : {
"X-HEADER-1" : "value1",
"X-HEADER-2" : "value2"
},
"key" : "{\"year\":2023,\"make\":\"Vinfast\"}",
"value" : {
"type" : "Trucks",
"price" : 2500,
"color" : "red"
}
}
echo 'X-HEADER-1:value1,X-HEADER-2:value2\t{"year":2023,"make":"Vinfast"}\t{"type":"Trucks","price":2500,"color":"red"}' | \
kafka-console-producer \
--bootstrap-server localhost:6969 \
--property "parse.key=true" \
--property "parse.headers=true" \
--topic cars

Consuming from cars

Let's confirm just one record is there by consuming from the cars topic.

kafka-console-consumer \
--bootstrap-server localhost:6969 \
--topic cars \
--from-beginning \
--max-messages 2 \
--timeout-ms 3000 \
--property print.key=true \
--property print.headers=true | jq

returns 1 event

{
"headers" : {
"X-HEADER-1" : "value1",
"X-HEADER-2" : "value2"
},
"key" : "{\"year\":2023,\"make\":\"Vinfast\"}",
"value" : {
"type" : "Trucks",
"price" : 2500,
"color" : "red"
}
}

Confirm all invalid cars are in the dead letter topic

Let's confirm the invalid records are in the dead letter topic.

kafka-console-consumer \
--bootstrap-server localhost:9092,localhost:9093,localhost:9094 \
--topic dead-letter-topic \
--from-beginning \
--max-messages 3 \
--timeout-ms 3000 \
--property print.key=true \
--property print.headers=true | jq

returns 2 events

{
"headers" : {
"X-ERROR-MSG" : "Message does not match the statement [SELECT * FROM cars WHERE color = 'red' and record.key.year > 2020]",
"X-TOPIC" : "cars",
"X-PARTITION" : "0"
},
"key" : null,
"value" : {
"type" : "SUV",
"price" : 2000,
"color" : "blue"
}
}
{
"headers" : {
"X-ERROR-MSG" : "Message does not match the statement [SELECT * FROM cars WHERE color = 'red' and record.key.year > 2020]",
"X-TOPIC" : "cars",
"X-PARTITION" : "0"
},
"key" : "{\"year\":2010,\"make\":\"BMW\"}",
"value" : {
"type" : "Sports",
"price" : 1000,
"color" : "red"
}
}

Check in the audit log that messages denial were captured

Check in the audit log that messages denial were captured in cluster kafka1

kafka-console-consumer \
--bootstrap-server localhost:9092,localhost:9093,localhost:9094 \
--topic _conduktor_gateway_auditlogs \
--from-beginning \
--timeout-ms 3000 \| jq 'select(.type=="SAFEGUARD" and .eventData.plugin=="io.conduktor.gateway.interceptor.DataQualityProducerInterceptor")'

returns 2 events

{
"id" : "90af879b-34ef-4ff2-bc7a-be047384170c",
"source" : "krn://cluster=p0KPFA_mQb2ixdPbQXPblw",
"type" : "SAFEGUARD",
"authenticationPrincipal" : "passthrough",
"userName" : "anonymous",
"connection" : {
"localAddress" : null,
"remoteAddress" : "/172.25.0.1:52792"
},
"specVersion" : "0.1.0",
"time" : "2024-11-25T21:27:11.915906419Z",
"eventData" : {
"interceptorName" : "cars-quality",
"level" : "error",
"plugin" : "io.conduktor.gateway.interceptor.safeguard.DataQualityProducerInterceptor",
"message" : "Request parameters do not satisfy the configured policy: Data quality policy is violated."
}
}
{
"id" : "72a7e9ab-5441-4a5a-9aed-39930230b159",
"source" : "krn://cluster=p0KPFA_mQb2ixdPbQXPblw",
"type" : "SAFEGUARD",
"authenticationPrincipal" : "passthrough",
"userName" : "anonymous",
"connection" : {
"localAddress" : null,
"remoteAddress" : "/172.25.0.1:52792"
},
"specVersion" : "0.1.0",
"time" : "2024-11-25T21:27:11.988664753Z",
"eventData" : {
"interceptorName" : "cars-quality",
"level" : "error",
"plugin" : "io.conduktor.gateway.interceptor.safeguard.DataQualityProducerInterceptor",
"message" : "Request parameters do not satisfy the configured policy: Data quality policy is violated."
}
}

Tearing down the docker environment

Remove all your docker processes and associated volumes

  • --volumes: Remove named volumes declared in the "volumes" section of the Compose file and anonymous volumes attached to containers.
docker compose down --volumes