Skip to main content
Quick navigation

What is SQL Data quality producer?

Use sql definition to assert data quality before being produced.

View the full demo in realtime

You can either follow all the steps manually, or watch the recording

Review the docker compose environment

As can be seen from docker-compose.yaml the demo environment consists of the following services:

  • gateway1
  • gateway2
  • kafka-client
  • kafka1
  • kafka2
  • kafka3
  • schema-registry
  • zookeeper
cat docker-compose.yaml

Starting the docker environment

Start all your docker processes, wait for them to be up and ready, then run in background

  • --wait: Wait for services to be running|healthy. Implies detached mode.
  • --detach: Detached mode: Run containers in the background
docker compose up --detach --wait

Creating virtual cluster teamA

Creating virtual cluster teamA on gateway gateway1 and reviewing the configuration file to access it

# Generate virtual cluster teamA with service account sa
token=$(curl \
--request POST "http://localhost:8888/admin/vclusters/v1/vcluster/teamA/username/sa" \
--header 'Content-Type: application/json' \
--user 'admin:conduktor' \
--silent \
--data-raw '{"lifeTimeSeconds": 7776000}' | jq -r ".token")

# Create access file
echo """
bootstrap.servers=localhost:6969
security.protocol=SASL_PLAINTEXT
sasl.mechanism=PLAIN
sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required username='sa' password='$token';
""" > teamA-sa.properties

# Review file
cat teamA-sa.properties

Creating topic cars on teamA

Creating on teamA:

  • Topic cars with partitions:1 and replication-factor:1
kafka-topics \
--bootstrap-server localhost:6969 \
--command-config teamA-sa.properties \
--replication-factor 1 \
--partitions 1 \
--create --if-not-exists \
--topic cars

Adding interceptor cars-quality

Let's create an interceptor to ensure the data produced is valid.

Creating the interceptor named cars-quality of the plugin io.conduktor.gateway.interceptor.DataQualityProducerPlugin using the following payload

{
"pluginClass" : "io.conduktor.gateway.interceptor.DataQualityProducerPlugin",
"priority" : 100,
"config" : {
"statement" : "SELECT * FROM cars WHERE color = 'red' and record.key.year > 2020",
"action" : "BLOCK_WHOLE_BATCH",
"deadLetterTopic" : "dead-letter-topic"
}
}

Here's how to send it:

curl \
--request POST "http://localhost:8888/admin/interceptors/v1/vcluster/teamA/interceptor/cars-quality" \
--header 'Content-Type: application/json' \
--user 'admin:conduktor' \
--silent \
--data @step-07-cars-quality.json | jq

Producing an invalid car

Produce invalid record to the cars topic (record is not produced because color is not red)

Sending 1 event

{
"type" : "SUV",
"price" : 2000,
"color" : "blue"
}

with

echo '{"type":"SUV","price":2000,"color":"blue"}' | \
kafka-console-producer \
--bootstrap-server localhost:6969 \
--producer.config teamA-sa.properties \
--topic cars

[!IMPORTANT] We get the following exception

org.apache.kafka.common.errors.PolicyViolationException:
> Request parameters do not satisfy the configured policy: Data quality policy is violated.

Producing an invalid car based on key

Produce invalid record to the cars topic (record is not produced because year is not > 2020)

Sending 1 event

{
"headers" : { },
"key" : "{\"year\":2010,\"make\":\"BMW\"}",
"value" : "{\"type\":\"Sports\",\"price\":1000,\"color\":\"red\"}"
}

with

echo '{"year":2010,"make":"BMW"}\t{"type":"Sports","price":1000,"color":"red"}' | \
kafka-console-producer \
--bootstrap-server localhost:6969 \
--producer.config teamA-sa.properties \
--property "parse.key=true" \
--topic cars

[!IMPORTANT] We get the following exception

org.apache.kafka.common.errors.PolicyViolationException:
> Request parameters do not satisfy the configured policy: Data quality policy is violated.

Producing a valid car

Produce valid record to the cars topic

Sending 1 event

{
"headers" : {
"X-HEADER-1" : "value1",
"X-HEADER-2" : "value2"
},
"key" : "{\"year\":2023,\"make\":\"Vinfast\"}",
"value" : "{\"type\":\"Trucks\",\"price\":2500,\"color\":\"red\"}"
}

with

echo 'X-HEADER-1:value1,X-HEADER-2:value2\t{"year":2023,"make":"Vinfast"}\t{"type":"Trucks","price":2500,"color":"red"}' | \
kafka-console-producer \
--bootstrap-server localhost:6969 \
--producer.config teamA-sa.properties \
--property "parse.key=true" \
--property "parse.headers=true" \
--topic cars

Consuming from cars

Let's confirm just one record is there by consuming from the cars topic.

kafka-console-consumer \
--bootstrap-server localhost:6969 \
--consumer.config teamA-sa.properties \
--topic cars \
--from-beginning \
--max-messages 1 \
--timeout-ms 10000 \
--property print.key=true \
--property print.headers=true | jq

returns

jq: parse error: Invalid numeric literal at line 1, column 11
Processed a total of 1 messages

Confirm all invalid cars are in the dead letter topic

Let's confirm the invalid records are in the dead letter topic.

kafka-console-consumer \
--bootstrap-server localhost:19092,localhost:29093,localhost:29094 \
--topic dead-letter-topic \
--from-beginning \
--max-messages 2 \
--timeout-ms 10000 \
--property print.key=true \
--property print.headers=true | jq

returns

[2024-02-14 04:40:08,192] WARN [Consumer clientId=console-consumer, groupId=console-consumer-77413] Connection to node -3 (localhost/127.0.0.1:29094) could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient)
[2024-02-14 04:40:08,192] WARN [Consumer clientId=console-consumer, groupId=console-consumer-77413] Bootstrap broker localhost:29094 (id: -3 rack: null) disconnected (org.apache.kafka.clients.NetworkClient)
[2024-02-14 04:40:08,294] WARN [Consumer clientId=console-consumer, groupId=console-consumer-77413] Connection to node -2 (localhost/127.0.0.1:29093) could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient)
[2024-02-14 04:40:08,294] WARN [Consumer clientId=console-consumer, groupId=console-consumer-77413] Bootstrap broker localhost:29093 (id: -2 rack: null) disconnected (org.apache.kafka.clients.NetworkClient)
jq: parse error: Invalid numeric literal at line 1, column 12
Processed a total of 2 messages

Check in the audit log that messages denial were captured

Check in the audit log that messages denial were captured in cluster kafka1

kafka-console-consumer \
--bootstrap-server localhost:19092,localhost:29093,localhost:29094 \
--topic _auditLogs \
--from-beginning \
--timeout-ms 3000 \
| jq 'select(.type=="SAFEGUARD" and .eventData.plugin=="io.conduktor.gateway.interceptor.DataQualityProducerInterceptor")'

returns

[2024-02-14 04:40:10,285] WARN [Consumer clientId=console-consumer, groupId=console-consumer-79560] Connection to node -3 (localhost/127.0.0.1:29094) could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient)
[2024-02-14 04:40:10,285] WARN [Consumer clientId=console-consumer, groupId=console-consumer-79560] Bootstrap broker localhost:29094 (id: -3 rack: null) disconnected (org.apache.kafka.clients.NetworkClient)
Processed a total of 15 messages
{
"id": "b614ab23-d5bb-4ce9-b4dc-2f11945df53c",
"source": "krn://cluster=i8_E-a17RG6Aat716F1btw",
"type": "SAFEGUARD",
"authenticationPrincipal": "teamA",
"userName": "sa",
"connection": {
"localAddress": null,
"remoteAddress": "/192.168.65.1:32870"
},
"specVersion": "0.1.0",
"time": "2024-02-14T03:40:02.319943378Z",
"eventData": {
"level": "error",
"plugin": "io.conduktor.gateway.interceptor.DataQualityProducerInterceptor",
"message": "Request parameters do not satisfy the configured policy: Data quality policy is violated."
}
}
{
"id": "18ffee64-e7b2-4daa-8875-519090c990b7",
"source": "krn://cluster=i8_E-a17RG6Aat716F1btw",
"type": "SAFEGUARD",
"authenticationPrincipal": "teamA",
"userName": "sa",
"connection": {
"localAddress": null,
"remoteAddress": "/192.168.65.1:32872"
},
"specVersion": "0.1.0",
"time": "2024-02-14T03:40:03.656667170Z",
"eventData": {
"level": "error",
"plugin": "io.conduktor.gateway.interceptor.DataQualityProducerInterceptor",
"message": "Request parameters do not satisfy the configured policy: Data quality policy is violated."
}
}

Tearing down the docker environment

Remove all your docker processes and associated volumes

  • --volumes: Remove named volumes declared in the "volumes" section of the Compose file and anonymous volumes attached to containers.
docker compose down --volumes