Kafka 101 Study Notes & Small Experiment

Yunus Kılıç
3 min readApr 10, 2023

I started post series of Kafka. In this first post, fundamentals of the Kafka are mentioned in a nutshell and a small experiment also exists to demonstrate this information.

Topic

A particular stream of data

like a table in a database or folder in a file system

You can have as many topics as you want

Partitions

Topics are split into partitions

Messages within each partition are ordered.

Kafka maintains a numerical offset for each record in a partition

Producers

Write data on topics

Message Key

Producers can choose to send a key

if key is null, then round-robin will be used

if key is not null, all the message has the same key will go to the same partition=>hashing

Consumer

Read data from the partition

Consumer Group

When multiple consumers in the same application perform a similar “logical job,” they can be organized as a Kafka consumer group.

Each consumer within the group reads from different partitions. This means if your customer group size is more than the partition size, there will be some inactive consumers.

Multiple consumer groups can read the same topic

Delivery Guarantees

  • At Least Once

Commits offset after the message processed

  • At Most Once

Commits offset soon after the message processed

  • Exact Once

Use Kafka workflow

Broker

Kafka cluster is composed of multiple Broker

Topic Replication Factor

To increase availability

When the replication factor is more than one, there are one leader and one or more replicas

By default, producers write to the leader, and consumers read from the leader

Producer Acks

ack=0 no ack

ack=1 from the only leader

ack=all from leader+replicas

Consumer Groups and Rebalance

Consumer leaves and enter the group

Or a new partition added to a topic

  • Eager Rebalance

Default

All consumers give up membership

All consumers rejoin to assigned partition

The assigned partition may be changed

  • Cooperative Rebalance

“Complete and global load balancing does not need to be completed in a single round of rebalancing. Instead, it’s sufficient if the clients converge shortly to a state of balanced load after just a few consecutive rebalances.

The world should not be stopped. Resources that don’t need to change hands should not stop being utilized.”(confluent.io)

Experiment

Install Kafka

For Mac, Linux, and Windows you can use Brew to install Kafka

For Windows install WSL2 before starting and install Brew to Linux instance

brew install kafka

Start Zookeeper, you can define an alias for making easier further usage

$ alias ks='/home/linuxbrew/.linuxbrew/opt/kafka/bin/kafka-server-start /home/linuxbrew/.linuxbrew/etc/kafka/server.properties'
$ ks

Start Kafka again with an alias in a different tab

$ alias ks='/home/linuxbrew/.linuxbrew/opt/kafka/bin/kafka-server-start /home/linuxbrew/.linuxbrew/etc/kafka/server.properties'
$ ks

Also, include Kafka bin to PATH

 $ export PATH="/home/linuxbrew/.linuxbrew/opt/kafka/bin/:$PATH"

Create a topic with 3 partitions

kafka-topics --create --bootstrap-server localhost:9092 --replication-factor 1 --partitions 3 --topic test

Create a producer with RoundRobin

$ kafka-console-producer --bootstrap-server localhost:9092 --producer-property partitioner.class=org.apache.kafka.clients.producer.RoundRobinPartitioner --topic test
>

Create 3 consumers with the same consumer group in different tabs

$ kafka-console-consumer --bootstrap-server localhost:9092 --topic test --group myapp

Enter some message in the producer tab. The messages will be distributed to consumers within the consumer group

That is all for the first post of the series.

Thank you.

Credit: I am following Stephane Maarek Udemy courses. He is an excellent instructor both in AWS and Kafka.

Also, https://www.confluent.io/ is a resource for some information.

--

--

Yunus Kılıç

I have 10 years of experience in high-quality software application development, implementation, and integration.