connect-standalone
to write data into Kafka.
In this tutorial, we will stream the changes of Wikipedia into a Kafka topic.
You can see the stream changes in Wikipedia here
What is Kafka Connect?
In order to get data into Apache Kafka, we have seen that we need to leverage Kafka producers. Over time, it has been noticed that many companies shared the same data source types (databases, systems, etc…) and so writing open-source standardized code could be helpful for the greater good. The same thinking goes for Kafka consumers. Kafka Connect is a tool that allows us to integrate popular systems with Kafka. It allows us to re-use existing components to source data into Kafka and sink data out from Kafka into other data stores. Example of popular Kafka connectors include:- Kafka Connect source connectors (producers): Databases (through the Debezium connector), JDBC, Couchbase, GoldenGate, SAP HANA, Blockchain, Cassandra, DynamoDB, FTP, IOT, MongoDB, MQTT, RethinkDB, Salesforce, Solr, SQS, Twitter, etc…
- Kafka Connect sink connectors (consumers): S3, ElasticSearch, HDFS, JDBC, SAP HANA, DocumentDB, Cassandra, DynamoDB, HBase, MongoDB, Redis, Solr, Splunk, Twitter
How to use Kafka Connect in standalone mode?
To use Kafka Connect in standalone mode, we need to provide the mandatory parameters:- Download a Kafka Connect connector, either from GitHub or Confluent Hub Confluent Hub
- Create a configuration file for your connector
- Use the
connect-standalone.sh
CLI to start the connector
Example: Kafka Connect standalone with Wikipedia data
Create the Kafka topicwikipedia.recentchange
in Kafka with 3 partitions
wikipedia.dlq
, for catching any errors
kafka_2.13-2.8.1/connectors/kafka-connect-sse
:
connectors/kafka-connect-sse/connector.properties
with the following properties:
bin
and config
folders are)
Edit the content of the config/connect-standalone.properties
file
plugin.path
config: this is where you indicate the folder where you store your Kafka connectors you have downloaded before.
This must be an absolute path (not relative, and no shortcut with ~
) to your connectors
directory
If you fail this step, Kafka Connect will stop after starting it.
Next, we can start our Kafka Connect standalone connector
wikipedia.recentchange
topic: