Encryption - FAQ
Table of Contents
Introduction
The Encryption interceptor is a robust and versatile solution for securing data within Kafka records. Its primary function is to safeguard sensitive information from unauthorized access, thereby enhancing data security both in transit and at rest.
Key Features
Field-Level Encryption: Encrypts specific fields within Kafka records, such as passwords or Personally Identifiable Information (PII). This feature is ideal for scenarios where only certain parts of a message contain sensitive data.
Full Message Encryption: Encrypts the entire Kafka record, ensuring that all contents of the message are secured. This is particularly useful when the entire message is sensitive.
You can find more details about the encryptions types here.
How It Works
Once configured, the encryption and decryption processes are seamlessly managed by the interceptor.
Encryption: The interceptor identifies the data that needs to be encrypted & the KMS details to share the encryption key, Gateway will then encrypt and produce the message.
Decryption: Similar to encryption, the interceptor can decrypt either the entire message, specific fields or all the fields, based on your configuration.
See Encryption and Decryption Processes below for more details.
Flexibility and Compatibility
You can refine how it's encrypted with a choice of algorithm and KMS provider.
Multiple Encryption Algorithms: The interceptor supports a variety of encryption algorithms, allowing you to choose the one that best meets your security requirements.
KMS Integration: It integrates with various Key Management Services (KMS), providing flexibility in how you manage and store encryption keys.
Encryption and Decryption Processes
Details on how the encryption takes place step by step. Jump to the diagram below if you want the simplified steps involving the keys.
How Does Gateway Encrypt Data?
-
Data Identification: The interceptor first determines, based on its configuration, what data needs to be encrypted. This may include the entire message, specific fields, or all the fields within the message. For example, if you have configured the interceptor to encrypt a
password
field, it will target this field within the incoming Kafka record for encryption. -
Key Retrieval: The interceptor then generates a key and shares it with the the configured Key Management Service (KMS), or retrieves it if existing. Supported KMS options include Vault, Azure, AWS, GCP, or an in-memory service for local development only. The key is fetched using the
keySecretId
specified in your configuration to ensure the correct key is utilized. You can find more details about the key retrieval here. -
Encryption: Once the key is generated, or retrieved, the interceptor encrypts the identified data using the specified encryption algorithm. The original data within the message is now replaced with the encrypted version.
-
Transmission: Finally, the encrypted data is either:
- Stored as is if it is an Avro record
- Converted into a JSON format
And is then transmitted as a string to the designated destination.
You can find more details about this last point here.
How Does Gateway Decrypt Data?
-
Data Identification: The interceptor first determines, based on its configuration, which data needs to be decrypted. This may include the entire message, specific fields, or all the fields within the message.
-
Key Retrieval: The interceptor retrieves the decryption key from the Key Management Service (KMS). Typically, this is the same key that was used during encryption. The correct key is obtained using the
keySecretId
provided in your interceptor configuration, and that is stored in the header of the record, on the backing Kafka. You can find more details about the key retrieval here. -
Decryption: The interceptor then decrypts the identified data using the retrieved key and the specified encryption algorithm. The decrypted data replaces the encrypted data within the message.
-
Consumption: Once decrypted, the message is ready for consumption by the end-user or application. The interceptor ensures that the decrypted data is correctly formatted and fully compatible with the Kafka record structure.
Note: The encryption and decryption processes are fully transparent to the end-user or application. The interceptor manages all these operations, allowing you to concentrate on your core business logic.
Key Management
The interceptor uses the envelope encryption
technique to encrypt the data.
Let's define some key terms to better understand the section below:
Term | Definition |
---|---|
KMS | Key Management Service: A system responsible for managing and storing cryptographic keys, including the KEK. |
KEK | Key Encryption Key: A key stored in the KMS, used to encrypt the DEK. Notably, the KEK is never exposed to or known by the interceptor. |
DEK | Data Encryption Key: A key generated by the interceptor, used to encrypt the actual data. |
EDEK | Encrypted Data Encryption Key: The DEK that has been encrypted by the KEK, ensuring that the DEK remains secure when stored or transmitted. |
To encrypt the data, the Gateway:
- Generates a DEK that is used to encrypt the data
- Sends the DEK to the KMS, so it encrypts it using its KEK and returns the EDEK to the Gateway
- Cache the DEK & EDEK in memory for a configurable Time to Live (TTL)
- Encrypts the data using the DEK
- Stores the EDEK alongside the encrypted data, and both are sent to the backing Kafka
To decrypt the data, the Gateway:
- Retrieves the EDEK that's stored with the encrypted data
- Sends the EDEK to the KMS, which decrypts it (using the KEK) and returns the DEK to Gateway
- Decrypts the data using the DEK
Optimizing Performance with Caching
To reduce the number of calls to the KMS and avoid some of the steps detailed above, the interceptor caches the DEK in memory. The cache has a configurable Time to Live (TTL), and the interceptor will call the KMS to decrypt the EDEK if the DEK is not in the cache, as detailed in the steps 1 and 2 above.
FAQ
How does encryption work with Avro, JSON Schema, and Protocol Buffers records?
- Gateway >= 3.3.0 and Avro format: The record is stored in Avro in the backing Kafka, and the encrypted non-string fields are stored in the headers of the record.
- Gateway < 3.3.0 or Protobuf / JSON schema: the record is stored in JSON in the backing Kafka and will not be compatible with the schema if the encrypted fields are not strings.
From 3.3.0 (Avro only)
This is for Avro only field-level encryption, for formats like Protobuf and JSON Schema, the below has no effect.
Gateway runs with schemaDataMode
set to preserve_avro
, to preserve the original record type. The plugin maintains the Avro format of the record rather than converting it to JSON, as was the previous behavior.
When the field type isn't a string, the interceptor will set it to the minimum value of its type (-2147483648
for integers, 1.4e-45
for floats, etc.). Its encrypted value will be a string stored in the headers of the record.
If you want to fall back to legacy behavior of converting to JSON, you can explicitly set "schemaDataMode": "convert_json"
.
Before 3.3.0 (and later for Protobuf and JSON Schema)
In legacy versions of Gateway we store all encrypted data in a JSON format in the backing Kafka, and we get it back to its original format during decryption. If a field cannot be decrypted due to a lack of permissions, it is replaced with a default value to maintain schema compatibility.
Example: Consider the case of a salary field:
Original value: 2000
(integer)
Encrypted value: XQS213KKDK2Q
(string)
When decrypting:
- If decryption is successful and the user has the necessary permissions, the salary is restored to its original numeric value.
- If decryption fails due to insufficient permissions, the salary is set to a default value (e.g., 0) instead of the encrypted string.
This had it's issues which is why we changed the design to the above:
- As the data pushed to Gateway are in Avro format (for instance), and the consumers expect Avro too, then the data must be decrypted to get back to its expected format.
- The Decryption plugin cannot be applied without decrypting a field. This means that your consumers are not able to consume data in its original format without decrypting at least one field.
- Even if the field encrypted is a string, we still store it as a JSON, even though it is not necessary.