Kafka 101 Study Notes & Small Experiment
I started post series of Kafka. In this first post, fundamentals of the Kafka are mentioned in a nutshell and a small experiment also exists to demonstrate this information.
Topic
A particular stream of data
like a table in a database or folder in a file system
You can have as many topics as you want
Partitions
Topics are split into partitions
Messages within each partition are ordered.
Kafka maintains a numerical offset for each record in a partition
Producers
Write data on topics
Message Key
Producers can choose to send a key
if key is null, then round-robin will be used
if key is not null, all the message has the same key will go to the same partition=>hashing
Consumer
Read data from the partition
Consumer Group
When multiple consumers in the same application perform a similar “logical job,” they can be organized as a Kafka consumer group.
Each consumer within the group reads from different partitions. This means if your customer group size is more than the partition size, there will be some inactive consumers.
Multiple consumer groups can read the same topic
Delivery Guarantees
- At Least Once
Commits offset after the message processed
- At Most Once
Commits offset soon after the message processed
- Exact Once
Use Kafka workflow
Broker
Kafka cluster is composed of multiple Broker
Topic Replication Factor
To increase availability
When the replication factor is more than one, there are one leader and one or more replicas
By default, producers write to the leader, and consumers read from the leader
Producer Acks
ack=0 no ack
ack=1 from the only leader
ack=all from leader+replicas
Consumer Groups and Rebalance
Consumer leaves and enter the group
Or a new partition added to a topic
- Eager Rebalance
Default
All consumers give up membership
All consumers rejoin to assigned partition
The assigned partition may be changed
- Cooperative Rebalance
“Complete and global load balancing does not need to be completed in a single round of rebalancing. Instead, it’s sufficient if the clients converge shortly to a state of balanced load after just a few consecutive rebalances.
The world should not be stopped. Resources that don’t need to change hands should not stop being utilized.”(confluent.io)
Experiment
Install Kafka
For Mac, Linux, and Windows you can use Brew to install Kafka
For Windows install WSL2 before starting and install Brew to Linux instance
brew install kafka
Start Zookeeper, you can define an alias for making easier further usage
$ alias ks='/home/linuxbrew/.linuxbrew/opt/kafka/bin/kafka-server-start /home/linuxbrew/.linuxbrew/etc/kafka/server.properties'
$ ks
Start Kafka again with an alias in a different tab
$ alias ks='/home/linuxbrew/.linuxbrew/opt/kafka/bin/kafka-server-start /home/linuxbrew/.linuxbrew/etc/kafka/server.properties'
$ ks
Also, include Kafka bin to PATH
$ export PATH="/home/linuxbrew/.linuxbrew/opt/kafka/bin/:$PATH"
Create a topic with 3 partitions
kafka-topics --create --bootstrap-server localhost:9092 --replication-factor 1 --partitions 3 --topic test
Create a producer with RoundRobin
$ kafka-console-producer --bootstrap-server localhost:9092 --producer-property partitioner.class=org.apache.kafka.clients.producer.RoundRobinPartitioner --topic test
>
Create 3 consumers with the same consumer group in different tabs
$ kafka-console-consumer --bootstrap-server localhost:9092 --topic test --group myapp
Enter some message in the producer tab. The messages will be distributed to consumers within the consumer group
That is all for the first post of the series.
Thank you.
Credit: I am following Stephane Maarek Udemy courses. He is an excellent instructor both in AWS and Kafka.
Also, https://www.confluent.io/ is a resource for some information.