Hello Apache Kafka

Tiago Albuquerque
4 min readDec 20, 2019

First steps implementing messages streams in Apache Kafka

Consuming messages from where, sir?

This article will give a very brief introduction to Apache Kafka, referencing its keys concepts from official documentation and then focus on implementation of a basic message flow from producer to consumer using Java programming language.

Logo from official site

Apache Kafka is a distributed streaming platform, in which you can produce a message, attach it to a topic and then consume that message, ie, yet another solution to the “Producer-Consumer” issue using streams.

It is usually used to send and transform huge amount of data in real time between applications. And apparently it’s doing quite well on handling real time data pipelines and supporting events streams, since it's been very used in industry.

There is a list of use cases in the official site that better describes its real case usage.

The key concepts of Apache Kafka is:

- Kafka is run as a cluster on one or more servers that can span multiple datacenters.

- Kafka cluster stores streams of records in categories called topics. So a topic is a category or feed name to which records are published.

- Topics can have multiple subscribers. For each topic, the Kafka cluster maintains a partitioned log (partitions).

- Each record consists of a key, a value, and a timestamp.

First steps (using terminal)

To get familiarized with the concepts of Kafka, it's highly recommended to read and follow the quick start guide, which steps as listed below.

First of all, download the Kafka server, decompress it and navegate to its directory.

Then start the Zookeeper server using the script provided with Kafka (this will lock the terminal):

$ cd kafka_2.12-2.4.0
$ ./bin/zookeeper-server-start.sh config/zookeeper.properties
...

In another terminal window, start the Kafka server (this will lock the terminal too):

$ > ./bin/kafka-server-start.sh config/server.properties
...

Always keep both ZooKeeper and Kafka servers running if following any of the examples of this article.

Next, create a ‘test’ topic using the proper script, and after list the topics created to ensure the results:

$ ./bin/kafka-topics.sh --create --bootstrap-server localhost:9092 --replication-factor 1 --partitions 1 --topic test
$ ./bin/kafka-topics.sh --list --bootstrap-server localhost:9092
test

If necessary, it is possible to manually delete a topic using the same script:

$ ./bin/kafka-topics.sh --zookeeper localhost:2181 --delete --topic test .

Produce some messages using the proper script:

$ ./bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test
>my-msg-1
>my-msg-2
^C (Control+C to exit when done creating messages)

And finally start a consumer to get the messages:

$ bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test --from-beginning
my-msg-1
my-msg-2

^C
Processed a total of 2 messages

The quick start tutorial goes on with multi-broker cluster and getting data from files, that I recommend to read and follow. For the purpose of this article, understanding the message flow is enough.

Let's code!

It is implemented here two projects to produce and consume messages from Kafka server: The first one is a plain Java project and the other one is using Spring Boot framework.

Example 01: Kafka client Java project.

Maven dependency:

<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka-clients</artifactId>
<version>2.4.0</version>
</dependency>

Producer and Consumer classes implementation:

SimpleProducer.java
SimpleConsumer.java

Main class implementation to produce/consume records from the classes above:

SimpleKafkaDemo.java

Console output to be displayed:

Sent message to topic ::simple-demo-test:: | timestamp 1576771850940
Sent message to topic ::simple-demo-test:: | timestamp 1576771850959
Sent message to topic ::simple-demo-test:: | timestamp 1576771850963
Sent message to topic ::simple-demo-test:: | timestamp 1576771850965
... ...
All messages sent by producer
offset = 0, key = 1, value = data #1
offset = 1, key = 2, value = data #2
offset = 2, key = 3, value = data #3
offset = 3, key = 4, value = data #4
... ...
offset = 17, key = 18, value = data #18
offset = 18, key = 19, value = data #19
offset = 19, key = 20, value = data #20
The end!

Full project implementation available here.

Example 02: Kafka client Spring Boot project.

Maven dependency:

<dependency>
<groupId>org.springframework.kafka</groupId>
<artifactId>spring-kafka</artifactId>
</dependency>

Spring Boot configuration file (application.properties):

server.port: 9000spring.kafka.consumer.bootstrap-servers=localhost:9092
spring.kafka.consumer.group-id=group_id
spring.kafka.consumer.auto-offset-reset=earliest
spring.kafka.consumer.key-deserializer: org.apache.kafka.common.serialization.StringDeserializer
spring.kafka.consumer.value-deserializer=org.apache.kafka.common.serialization.StringDeserializer
spring.kafka.producer.bootstrap-servers=localhost:9092
spring.kafka.producer.key-serializer=org.apache.kafka.common.serialization.StringSerializer
spring.kafka.producer.value-serializer=org.apache.kafka.common.serialization.StringSerializer

Producer e Consumer classes implementation:

Producer.java (spring bean)
Consumer.java (spring bean)

REST endpoint to send message to:

MyKafkaController.java (REST Controller)

It is possible to run the application as a REST web-service or as a plain application:

Main Class

Console output to be displayed (Kafka server must be running and topic must be created):

... ...
INFO 31274 --- [main] com.tiagoamp.springbootkafka.Producer: ===> Producing message = my demo message1
... ...

Full project implementation available here.

Github repository linked here.

--

--