Skip to main content
Quick navigation

What is a Schema Payload Validation Policy Interceptor?

Avoid outages from missing or badly formatted records, ensure all messages adhere to a schema.

This interceptor also supports validating payload against specific constraints for AvroSchema and Protobuf.

This is similar to the validations provided by JsonSchema, such as:

  • Number: minimum, maximum, exclusiveMinimum, exclusiveMaximum, multipleOf
  • String: minLength, maxLength, pattern, format
  • Collections: maxItems, minItems

View the full demo in realtime

You can either follow all the steps manually, or watch the recording

Review the docker compose environment

As can be seen from docker-compose.yaml the demo environment consists of the following services:

  • gateway1
  • gateway2
  • kafka-client
  • kafka1
  • kafka2
  • kafka3
  • schema-registry
  • zookeeper
cat docker-compose.yaml

Starting the docker environment

Start all your docker processes, wait for them to be up and ready, then run in background

  • --wait: Wait for services to be running|healthy. Implies detached mode.
  • --detach: Detached mode: Run containers in the background
docker compose up --detach --wait

Creating virtual cluster teamA

Creating virtual cluster teamA on gateway gateway1 and reviewing the configuration file to access it

# Generate virtual cluster teamA with service account sa
token=$(curl \
--request POST "http://localhost:8888/admin/vclusters/v1/vcluster/teamA/username/sa" \
--header 'Content-Type: application/json' \
--user 'admin:conduktor' \
--silent \
--data-raw '{"lifeTimeSeconds": 7776000}' | jq -r ".token")

# Create access file
echo """
bootstrap.servers=localhost:6969
security.protocol=SASL_PLAINTEXT
sasl.mechanism=PLAIN
sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required username='sa' password='$token';
""" > teamA-sa.properties

# Review file
cat teamA-sa.properties

Creating topic topic-avro on teamA

Creating on teamA:

  • Topic topic-avro with partitions:1 and replication-factor:1
kafka-topics \
--bootstrap-server localhost:6969 \
--command-config teamA-sa.properties \
--replication-factor 1 \
--partitions 1 \
--create --if-not-exists \
--topic topic-avro

Review the example avro schema

Review the example avro schema

cat user-schema.avsc

Let's register it to the Schema Registry

curl -s \
http://localhost:8081/subjects/topic-avro/versions \
-X POST \
-H "Content-Type: application/vnd.schemaregistry.v1+json" \
--data "{\"schemaType\": \"AVRO\", \"schema\": $(cat user-schema.avsc | jq tostring)}"

Review invalid payload

Review invalid payload

cat invalid-payload.json

Let's send invalid data

cat invalid-payload.json | jq -c | \
kafka-avro-console-producer \
--bootstrap-server localhost:6969 \
--producer.config teamA-sa.properties \
--topic topic-avro \
--property schema.registry.url=http://localhost:8081 \
--property value.schema.id=1

Let's consume it back

That's pretty bad, you are going to propagate wrong data within your system!

kafka-avro-console-consumer \
--bootstrap-server localhost:6969 \
--consumer.config teamA-sa.properties \
--topic topic-avro \
--from-beginning \
--timeout-ms 3000

Adding interceptor guard-schema-payload-validate

Add Schema Payload Validation Policy Interceptor

Creating the interceptor named guard-schema-payload-validate of the plugin io.conduktor.gateway.interceptor.safeguard.SchemaPayloadValidationPolicyPlugin using the following payload

{
"pluginClass" : "io.conduktor.gateway.interceptor.safeguard.SchemaPayloadValidationPolicyPlugin",
"priority" : 100,
"config" : {
"schemaRegistryConfig" : {
"host" : "http://schema-registry:8081"
},
"topic" : "topic-.*",
"schemaIdRequired" : true,
"validateSchema" : true,
"action" : "BLOCK"
}
}

Here's how to send it:

curl \
--request POST "http://localhost:8888/admin/interceptors/v1/vcluster/teamA/interceptor/guard-schema-payload-validate" \
--header 'Content-Type: application/json' \
--user 'admin:conduktor' \
--silent \
--data @step-12-guard-schema-payload-validate.json | jq

Listing interceptors for teamA

Listing interceptors on gateway1 for virtual cluster teamA

curl \
--request GET 'http://localhost:8888/admin/interceptors/v1/vcluster/teamA' \
--header 'Content-Type: application/json' \
--user 'admin:conduktor' \
--silent | jq

Review the avro schema with validation rules

Review the avro schema with validation rules

cat user-schema-with-validation-rules.avsc

Let's update the schema with our validation rules

curl -s \
http://localhost:8081/subjects/topic-avro/versions \
-X POST \
-H "Content-Type: application/vnd.schemaregistry.v1+json" \
--data "{\"schemaType\": \"AVRO\", \"schema\": $(cat user-schema-with-validation-rules.avsc | jq tostring)}"

Let's asserts number of registered schemas

curl -s http://localhost:8081/subjects/topic-avro/versions

Let's produce the same invalid payload again

The payload has been rejected with useful errors

org.apache.kafka.common.errors.PolicyViolationException: Request parameters do not satisfy the configured policy. 
Topic 'topic-avro' has invalid avro schema payload: name is too short (1 < 3), email does not match format 'email', street is too long (56 > 15), city is too short (0 < 2), hobbies has too few items (1 < 2), age is greater than 10, age is greater than 10
cat invalid-payload.json | jq -c | \
kafka-avro-console-producer \
--bootstrap-server localhost:6969 \
--producer.config teamA-sa.properties \
--topic topic-avro \
--property schema.registry.url=http://localhost:8081 \
--property value.schema.id=2

Check in the audit log that message was denied

Check in the audit log that message was denied in cluster kafka1

kafka-console-consumer \
--bootstrap-server localhost:19092,localhost:19093,localhost:19094 \
--topic _auditLogs \
--from-beginning \
--timeout-ms 3000 \
| jq 'select(.type=="SAFEGUARD" and .eventData.plugin=="io.conduktor.gateway.interceptor.safeguard.SchemaPayloadValidationPolicyPlugin")'

returns

Processed a total of 13 messages
{
"id": "8598a296-4e50-4fb6-b1c4-b0ccd314529b",
"source": "krn://cluster=uoYQCU0nSMSu47Q3_eO5Rw",
"type": "SAFEGUARD",
"authenticationPrincipal": "teamA",
"userName": "sa",
"connection": {
"localAddress": null,
"remoteAddress": "/192.168.65.1:62994"
},
"specVersion": "0.1.0",
"time": "2024-02-14T03:18:16.600811135Z",
"eventData": {
"level": "error",
"plugin": "io.conduktor.gateway.interceptor.safeguard.SchemaPayloadValidationPolicyPlugin",
"message": "Request parameters do not satisfy the configured policy. Topic 'topic-avro' has invalid avro schema payload: name is too short (1 < 3), email does not match format 'email', street is too long (56 > 15), city is too short (0 < 2), hobbies has too few items (1 < 2), age is greater than 10, age is greater than 10"
}
}

Let's now produce a valid payload

cat valid-payload.json | jq -c | \
kafka-avro-console-producer \
--bootstrap-server localhost:6969 \
--producer.config teamA-sa.properties \
--topic topic-avro \
--property schema.registry.url=http://localhost:8081 \
--property value.schema.id=2

And consume it back

kafka-avro-console-consumer \
--bootstrap-server localhost:6969 \
--consumer.config teamA-sa.properties \
--topic topic-avro \
--from-beginning \
--timeout-ms 3000

Tearing down the docker environment

Remove all your docker processes and associated volumes

  • --volumes: Remove named volumes declared in the "volumes" section of the Compose file and anonymous volumes attached to containers.
docker compose down --volumes

Conclusion

You can enrich your existing schema to add even more data quality to your systems!