Kafka tips

Kafka : A quick view

Kafka is a distributed, partitioned, replicated commit log service. It provides the functionality of a messaging system, but with a unique design.

Kafka model and processes
Kafka model

A little vocabulary :

  • A producer is a process that publish messages to a Kafka topic
  • A consumer is a process that subscribe to topics and process the feed of published messages
  • A Kafka cluster is comprised of one or more servers, each of which is called a broker.
  • A topic is a category or feed name to which messages are published. For each topic, the Kafka cluster maintains a partitioned log with one or more partition.
  • The partitions of the log are distributed over the servers in the Kafka cluster (Kafka uses ZooKeeper to manage partition) with each server handling data and requests for a share of the partitions. Each partition is replicated across a configurable number of servers for fault tolerance

Start Kafka as a service (within initctl)

Creating the following 2 configuration files :

  • etc/init/kakfa.conf with: ​
description "Kafka Broker"
limit nofile 32768 32768

start on runlevel [2345]
stop on [!12345]

respawn
respawn limit 2 5

env CONFIG_HOME=/etc/kafka
env KAFKA_HOME=/usr/lib/kafka
umask 007

kill timeout 300

setuid kafka
setgid kafka

script
  . /etc/default/kafka-broker

  if [ "x$ENABLE" = "xyes" ]; then
    exec $KAFKA_HOME/bin/kafka-server-start.sh $CONFIG_HOME/server.properties
  fi
end script
  • etc/init/kafka.zookeeper.conf​ with
description "Kafka Broker"
start on runlevel [2345]
stop on starting rc RUNLEVEL=[016]
respawn
respawn limit 2 5
env KAFKA_HOME=/usr/lib/kafka
env CONFIG_HOME=/etc/kafka
umask 007
kill timeout 300
setuid kafka
setgid kafka
exec $KAFKA_HOME/bin/zookeeper-server-start.sh $HOME/zookeeper.properties

These folder may change depending on the installation mode.

Then start the services in the following order :

sudo initctl start kafka.zookeeper
sudo initctl start kafka​

OS change

This is well described in documentation

To check maximum number of fd in system type ‘cat /proc/sys/fs/file-max’

jabberwock@cassiopee:~$ cat /proc/sys/fs/file-max
1631132

Kafka in docker

A very good tutorial on wurstmesiter github page

Kafka offset commands

Offset for topic

This command prints the offset by partition for a given topic.

./bin/kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list hostname:port--topic topicName--time -1

Export Offset

This command retrieves the offsets of broker partitions in ZK and prints to an output file.

./bin/kafka-run-class.sh kafka.tools.ExportZkOffsets --zkconnect hostname:port --group consumerGroup --output-file ~/output.txt 

Import Offset

This command imports offsets for a topic partitions. File format is the same as the export. Be careful, this command affects consumer load (re-processing of multiple messages). All consumers must be stopped before changing the offset.

./bin/kafka-run-class.sh kafka.tools.ImportZkOffsets --zkconnect hostname:port --input-file ~/output.txt

Get the lag

This commands allow to print the consumer offset, log size,and consumer lag of the named topic, this information are in bytes.
The lag is the difference between log size and offset :
Lag = log size - offset

bin/kafka-run-class.sh kafka.tools.ConsumerOffsetChecker --zkconnect hostname:port --group consumerGroup --topic topic1

You can also monitor the MaxLag and the MinFetch jmx bean (see http://kafka.apache.org/documentation.html#monitoring).

Delete a topic

Deleting topic isn’t always working in 0.8.1.1. It should be working in the next release, 0.8.2. See here and KAFKA-1397
But there is a tricky way to do that :
- Stop the brokers.
- Delete the partition folders for this topic on all brokers.
- Delete the topic-partition znode in ZK under the /brokers path.
- Restart the brokers.