Skip to main content
Learn how to use Kafka Connect in standalone mode in 20 minutes Kafka Connect provides a scalable way to move data between Kafka and external systems. This tutorial demonstrates running a connector in standalone mode using real-time Wikipedia changes. What you’ll learn:
  • What Kafka Connect is and when to use it
  • How to configure a standalone connector
  • How to set up the required properties files
  • How to verify data flowing into Kafka
Live data streamThis tutorial uses the Wikipedia recent changes stream - a real-time feed of edits happening across Wikipedia.

What is Kafka Connect?

Kafka Connect is a framework for streaming data between Kafka and external systems using reusable connectors. Instead of writing custom producer/consumer code for common integrations, you can use pre-built connectors.
Connector typeDirectionExamples
SourceExternal → KafkaDebezium, JDBC, S3, MongoDB, Twitter
SinkKafka → ExternalElasticsearch, S3, JDBC, HDFS, Splunk
Find connectors on Confluent Hub.

How to use Kafka Connect in standalone mode?

To use Kafka Connect in standalone mode, we need to provide the mandatory parameters:
  • Download a Kafka Connect connector, either from GitHub or Confluent Hub Confluent Hub
  • Create a configuration file for your connector
  • Use the connect-standalone.sh CLI to start the connector

Example: Kafka Connect standalone with Wikipedia data

Create the Kafka topic wikipedia.recentchange in Kafka with 3 partitions
kafka-topics --bootstrap-server localhost:9092 --topic wikipedia.recentchange --create --partitions 3 --replication-factor 1
As well as the topic dead letter queue wikipedia.dlq, for catching any errors
kafka-topics --bootstrap-server localhost:9092 --topic wikipedia.dlq --create --partitions 3 --replication-factor 1
Download the release JAR and configuration from here and unzip the archive on your computer at kafka_2.13-2.8.1/connectors/kafka-connect-sse:
 ~/kafka_2.13-2.8.1/connectors  ls -R
kafka-connect-sse

./kafka-connect-sse:
connector.properties                            kafka-connect-sse-1.0-jar-with-dependencies.jar
Edit the configuration file connectors/kafka-connect-sse/connector.properties with the following properties:
name=sse-source-connector
tasks.max=1
connector.class=com.github.cjmatta.kafka.connect.sse.ServerSentEventsSourceConnector
topic=wikipedia.recentchange
sse.uri=https://stream.wikimedia.org/v2/stream/recentchange
errors.tollerance=all
errors.deadletterqueue.topic.name=wikipedia.dlq
Look into your Kafka installation directory (where your bin and config folders are) Edit the content of the config/connect-standalone.properties file
key.converter=org.apache.kafka.connect.json.JsonConverter
value.converter=org.apache.kafka.connect.json.JsonConverter
key.converter.schemas.enable=true
value.converter.schemas.enable=true
offset.flush.interval.ms=10000

# EDIT BELOW IF NEEDED
bootstrap.servers=localhost:9092
offset.storage.file.filename=/tmp/connect.offsets
plugin.path=/Users/stephanemaarek/kafka_2.13-2.8.1/connectors
The last three lines are the most important to make everything work. In particular, the plugin.path config: this is where you indicate the folder where you store your Kafka connectors you have downloaded before. This must be an absolute path (not relative, and no shortcut with ~) to your connectors directory If you fail this step, Kafka Connect will stop after starting it. Next, we can start our Kafka Connect standalone connector
connect-standalone ~/kafka_2.13-2.8.1/config/connect-standalone.properties ~/kafka_2.13-2.8.1/connectors/kafka-connect-sse/connector.properties
And as we can see, the data is flowing into our wikipedia.recentchange topic:
kafka-console-consumer --bootstrap-server localhost:9092 --topic wikipedia.recentchange

Standalone vs distributed mode

ModeUse caseScalability
StandaloneDevelopment, testing, single tasksSingle worker
DistributedProduction, high availabilityMultiple workers
Standalone mode runs on a single machine with no fault tolerance. For production deployments, use distributed mode with multiple workers.
See it in practice with ConduktorConduktor Console provides a visual interface for managing Kafka Connect clusters, deploying connectors and monitoring connector health and throughput.

Next steps