Sunday, January 17, 2016

Apache Kafka & Zookeeper - Multi-Broker Apache Kafka Cluster on a Single Node

Apache Kafka & Zookeeper - Multi-Broker Apache Kafka Cluster on a Single Node
Apache Kafka is an open source, distributed publish-subscribe messaging system. In comparison to most messaging systems, Kafka has better throughput, built-in partitioning, replication, and fault-tolerance which make it a good solution for large scale message processing applications.






















Kafka is built as a modern distributed system. Data is replicated and partitioned over a cluster of machines that can grow and shrink transparently to the applications using the cluster. Consumers of data can be scaled out over a pool of machines as well and automatically adapt to failures in the consuming processes.


A key aspect of Kafka's design is that it handles large set of data very easily. A Kafka broker can store many Terabytes of data. Thus it provides the facility to handle big data which would be impossible in a traditional database.
Thus, Kafka provides a real-time publish-subscribe solution, which overcomes the challenges of real-time data usage for consumption, for data volumes that may grow in order of magnitude, larger that the real data. Kafka also supports parallel data loading in the Hadoop Systems.




Apache Kafka mainly designed with the following characteristics:
Fast - A single Kafka broker can serve thousands of clients by handling megabytes of reads and  writes per second
Scalable -  Data are partitioned and streamlined over a cluster of machines to enable larger data
Durable - Messages are persistent and is replicated within the cluster to prevent data loss
Distributed by Design - It provides fault tolerance guarantees and durability.
Five Components of Apache Kafka
Topic: A topic is a category or feed name to which messages are published by the message producers. Topics are partitioned and each partition is represented by the ordered immutable sequence of messages. Each message in the partition is assigned a unique sequential ID (offset).
Broker: A Kafka cluster consists of servers where each one may have server processes (brokers). Topics are created within the context of broker processes.
Zookeeper: Zookeeper serves as the coordinator between the Kafka broker and consumers.
Producers: Producers publish data to the topics by choosing the appropriate partition within the topic.
Consumers: Consumers are the applications or processes that subscribe to topics and process the feed of published messages.

Zookeeper is an open source, high-performance co-ordination service used for distributed applications adapted by Kafka. It coordinates and synchronizes configuration information of distributed nodes. It is not possible to bye-pass Zookeeper and connects directly to the Kafka broker. Once the Zookeeper is down, it cannot serve client request. Zookeeper is basically used to communicate between different nodes in a cluster. In Kafka, it is used to commit offset, so if node fails in any case it can be retrieved from the previously committed offset. Apart from this it also does other activities like leader detection, distributed synchronization, configuration management, identifies when a new node leaves or joins, the cluster, node status in real time, etc. It is used for managing, coordinating Kafka broker. In Hadoop ecosystem, Zookeeper is also used for cluster management for Hadoop. Thus, we have to say Zookeeper is mainly solving the problem of reliable distributed coordination.

Apache Kafka Installation
Actually, once we install Kafka, we can use the Zookeeper that comes with Kafka bundle.

Download Apache Kafka

Download ApacheKafka tar file
Then Unzip tar file & move to installation location.

$ tar -zxvf  kafka_2.10-0.8.2.0.tgz
$ mv kafka_2.10-0.8.2.0 /usr/local/kafka


Start Zookeeper

$ bin/zookeeper-server-start.sh config/zookeeper.properties

By default the Zookeeper server will listen on *:2181/tcp


Configure & Start Kafka Brokers

Here we will create three Kafka brokers whose configurations are based on the default config/server.properties. Apart from the settings below the configurations of the brokers are identical.

Create the config file for the first broker

$ cp config/server.properties config/server1.properties

Edit config/server1.properties and replace the existing configuration to below values as follows

broker.id=1
port=9092
log.dir=/tmp/kafka-logs-1

Create the config file for the Second broker

$ cp config/server.properties config/server2.properties

Edit config/server2.properties and replace the existing configuration to below values as follows

broker.id=2
port=9093
log.dir=/tmp/kafka-logs-2


Create the config file for the Third broker

$ cp config/server.properties config/server3.properties

Edit config/server1.properties and replace the existing configuration to below values as follows

broker.id=3
port=9094
log.dir=/tmp/kafka-logs-3

Now you need to start each Kafka broker in a separate console.

Start the first broker

$ env JMX_PORT=9999 bin/kafka-server-start.sh config/server1.properties

Start the second broker

$ env JMX_PORT=10000 bin/kafka-server-start.sh config/server2.properties

Start the third broker

$ env JMX_PORT=10001 bin/kafka-server-start.sh config/server3.properties


Summary of the configuration

             Broker 1           Broker 2        Broker 3     
------------------------------------------------------
Kafka   *:9092/tcp   *:9093/tcp      *:9094/tcp
JMX     *:9999/tcp   *:10000/tcp   *:10001/tcp


Create a Kafka topic

In Kafka 0.8, there are 2 ways of creating a new topic:
First one is turn on auto.create.topics.enable option on the broker. When the broker receives the first message for a new topic, it creates that topic with num.partitions and default.replication.factor.

Second way for creating Topic is to use the admin command bin/kafka-topics.sh.

Note: In Kafka 0.8.0 release use the admin command bin/kafka-create-topic.sh. kafka-topics.sh was removed in the release version and split into kafka-create-topic.sh and kafka-list-topic.sh.

$ bin/kafka-topics.sh --zookeeper localhost:2181 \
 --create --topic iqubal.kafka.topic --partitions 3 --replication-factor 2

Here Kafka will create three logical partitions and two replicas per partition for the topic. For each partition it will pick two brokers that will host those replicas. For each partition Kafka will elect a “leader” broker.

Note: For Kafka 0.8.0 release you must use the command: 

$ bin/kafka-create-topic.sh –zookeeper localhost:2181 –partition 3 –replica 2 –topic iqubal.kafka.topic

To get a list of topics, we can use "--list -- ..." command

$ bin/kafka-topics.sh --list --zookeeper localhost:2181
Kafkatopic iqubal.kafka.topic

Note: For Kafka 0.8.0 release you must use the command

$ bin/kafka-list-topic.sh –zookeeper localhost:2181 –topic iqubal.kafka.topic

Starts Producer - Sending Messages

The producer client can accept inputs from the command line and publishes them as a message to the Kafka cluster. Each new line entered, by default, is a new message as shown below.

$ bin/kafka-console-producer.sh --broker-list localhost:9092 localhost:9093 localhost:9094 --topic iqubal.kafka.topic

Iqubal Mustafa Kaki’s Blog on Apache Kafka.

Internal Kafka Multiple Broker Cluster

Starts Consumer- Consuming Messages

Consumer client consumes messages, and we'll use the same consumer client.

$ bin/kafka-console-consumer.sh --zookeeper localhost:2181 --from-beginning --topic iqubal.kafka.topic

And at the end of the output you will see the following messages

Iqubal Mustafa Kaki’s Blog on Apache Kafka.

Internal Kafka Multiple Broker Cluster

Hope you have enjoyed the article.

Author: Iqubal Mustafa Kaki, Technical Specialist.

Want to connect with me
If you want to connect with me, please connect through my email - 
iqubal.kaki@gmail.com

5 comments:

  1. interesting blog to read.. i got more useful information from this blog which really helpful to my career..

    hadoop training institute in chenni | big data training institute in chennai

    ReplyDelete
  2. It is nice blog Thank you porovide importent information and i am searching for same information to save my timeHadoop Administration Online Course Hyderabad

    ReplyDelete
  3. Really nice blog post.provided a helpful information.I hope that you will post more updates like thisHadoop Admin Online Training

    ReplyDelete
  4. I personally use them exclusively high-quality elements : you will notice these folks during: review of Pepperstone.com

    ReplyDelete
  5. Thanks for sharing useful information. I learned something new from your bog. Its very interesting and informative. keep updating. If you are looking for any Hadoop admin related information, please visit our website Hadoop administration training institute in Bangalore

    ReplyDelete