Apache Kafka & Zookeeper - Multi-Broker Apache Kafka Cluster on a Single Node
Apache Kafka is an open source, distributed publish-subscribe messaging system. In comparison to most messaging systems, Kafka has better throughput, built-in partitioning, replication, and fault-tolerance which make it a good solution for large scale message processing applications.
Apache Kafka is an open source, distributed publish-subscribe messaging system. In comparison to most messaging systems, Kafka has better throughput, built-in partitioning, replication, and fault-tolerance which make it a good solution for large scale message processing applications.
Kafka is built as a modern distributed system. Data is
replicated and partitioned over a cluster of machines that can grow and shrink
transparently to the applications using the cluster. Consumers of data can be
scaled out over a pool of machines as well and automatically adapt to failures
in the consuming processes.
A key aspect of Kafka's
design is that it handles large set of data very easily. A Kafka broker can
store many Terabytes of data. Thus it provides the facility to handle big data
which would be impossible in a traditional database.
Thus, Kafka provides a
real-time publish-subscribe solution, which overcomes the challenges of
real-time data usage for consumption, for data volumes that may grow in order
of magnitude, larger that the real data. Kafka also supports parallel data
loading in the Hadoop
Systems.
Apache Kafka mainly designed
with the following characteristics:
Fast - A
single Kafka broker can serve thousands of clients by handling megabytes of
reads and writes per second
Scalable - Data are partitioned and streamlined over a cluster of machines to enable larger data
Durable - Messages are persistent and is replicated within the cluster to prevent data loss
Distributed by Design - It provides fault tolerance guarantees and durability.
Scalable - Data are partitioned and streamlined over a cluster of machines to enable larger data
Durable - Messages are persistent and is replicated within the cluster to prevent data loss
Distributed by Design - It provides fault tolerance guarantees and durability.
Five
Components of Apache Kafka
Topic: A topic is a category or feed name to which
messages are published by the message producers. Topics are partitioned and each partition is
represented by the ordered immutable sequence of messages. Each message in the
partition is assigned a unique sequential ID (offset).
Broker: A Kafka cluster consists of servers where
each one may have server processes (brokers). Topics are created within
the context of broker processes.
Zookeeper: Zookeeper serves as the coordinator between
the Kafka broker and consumers.
Producers: Producers publish data to the topics by
choosing the appropriate partition within the topic.
Consumers: Consumers are the applications or processes
that subscribe to topics and process the feed of published messages.
Zookeeper is an open source,
high-performance co-ordination service used for distributed applications
adapted by Kafka. It coordinates and synchronizes configuration information of
distributed nodes. It is not possible to bye-pass Zookeeper and connects
directly to the Kafka broker. Once the Zookeeper is down, it cannot serve
client request. Zookeeper is basically used to communicate between different
nodes in a cluster. In Kafka, it is used to commit offset, so if node fails in
any case it can be retrieved from the previously committed offset. Apart from
this it also does other activities like leader detection, distributed
synchronization, configuration management, identifies when a new node leaves or
joins, the cluster, node status in real time, etc. It is used for managing,
coordinating Kafka broker. In Hadoop ecosystem, Zookeeper is also used for
cluster management for Hadoop. Thus, we have to say Zookeeper is
mainly solving the problem of reliable distributed coordination.
Apache Kafka Installation
Actually, once we install Kafka, we can
use the Zookeeper that comes with Kafka bundle.
Download Apache
Kafka
Download ApacheKafka tar file
Then Unzip tar file & move to
installation location.
$ tar -zxvf kafka_2.10-0.8.2.0.tgz
$ mv kafka_2.10-0.8.2.0 /usr/local/kafka
Start Zookeeper
$ bin/zookeeper-server-start.sh config/zookeeper.properties
By
default the Zookeeper server will listen on *:2181/tcp
Configure & Start Kafka Brokers
Here we will create three Kafka brokers whose
configurations are based on the default config/server.properties. Apart from the settings
below the configurations of the brokers are identical.
Create the config file for the first
broker
$ cp config/server.properties config/server1.properties
Edit config/server1.properties and
replace the existing configuration to below values as follows
broker.id=1
port=9092
log.dir=/tmp/kafka-logs-1
Create the config file for the Second
broker
$ cp config/server.properties config/server2.properties
Edit config/server2.properties and
replace the existing configuration to below values as follows
broker.id=2
port=9093
log.dir=/tmp/kafka-logs-2
Create the config file for the Third
broker
$ cp config/server.properties config/server3.properties
Edit config/server1.properties and
replace the existing configuration to below values as follows
broker.id=3
port=9094
log.dir=/tmp/kafka-logs-3
Now you need to start each Kafka
broker in a separate console.
Start the first broker
$ env JMX_PORT=9999 bin/kafka-server-start.sh
config/server1.properties
Start the second broker
$ env JMX_PORT=10000 bin/kafka-server-start.sh config/server2.properties
Start the third broker
$ env JMX_PORT=10001 bin/kafka-server-start.sh
config/server3.properties
Summary
of the configuration
Broker
1 Broker 2 Broker
3
------------------------------------------------------
Kafka
*:9092/tcp *:9093/tcp *:9094/tcp
JMX *:9999/tcp
*:10000/tcp *:10001/tcp
Create a Kafka topic
In Kafka 0.8, there are 2 ways of creating a new topic:
First one is turn on auto.create.topics.enable
option on the broker. When the broker receives the first message for a new
topic, it creates that topic with num.partitions
and default.replication.factor.
Second way for creating Topic is to use the admin command bin/kafka-topics.sh.
Note: In Kafka
0.8.0 release use the admin command bin/kafka-create-topic.sh.
kafka-topics.sh was removed in the release version and split
into kafka-create-topic.sh and kafka-list-topic.sh.
$ bin/kafka-topics.sh --zookeeper
localhost:2181 \
--create --topic
iqubal.kafka.topic --partitions 3 --replication-factor 2
Here Kafka will create three logical partitions and two replicas per partition for the
topic. For each partition it will pick two brokers that will host those
replicas. For each partition Kafka will elect a “leader” broker.
Note: For Kafka 0.8.0 release you
must use the command:
$ bin/kafka-create-topic.sh
–zookeeper localhost:2181 –partition 3 –replica 2 –topic iqubal.kafka.topic
To
get a list of topics, we can use "--list -- ..." command
$ bin/kafka-topics.sh
--list --zookeeper localhost:2181
Kafkatopic iqubal.kafka.topic
Note: For Kafka 0.8.0 release you must use the command
$ bin/kafka-list-topic.sh
–zookeeper localhost:2181 –topic iqubal.kafka.topic
Starts Producer - Sending Messages
The producer client can accept inputs
from the command line and publishes them as a message to the Kafka cluster.
Each new line entered, by default, is a new message as shown below.
$
bin/kafka-console-producer.sh --broker-list localhost:9092 localhost:9093
localhost:9094 --topic iqubal.kafka.topic
Iqubal Mustafa Kaki’s Blog on Apache Kafka.
Internal Kafka Multiple Broker Cluster
Starts Consumer- Consuming Messages
Consumer client consumes messages, and
we'll use the same consumer client.
$ bin/kafka-console-consumer.sh
--zookeeper localhost:2181 --from-beginning --topic iqubal.kafka.topic
And at the end of the output you will
see the following messages
Iqubal Mustafa Kaki’s Blog on Apache
Kafka.
Internal Kafka Multiple Broker Cluster
Hope you have enjoyed the article.
Author: Iqubal Mustafa Kaki, Technical Specialist.
Want to connect with me
If you want to connect with me, please connect through my email - iqubal.kaki@gmail.com
Want to connect with me
If you want to connect with me, please connect through my email - iqubal.kaki@gmail.com
interesting blog to read.. i got more useful information from this blog which really helpful to my career..
ReplyDeletehadoop training institute in chenni | big data training institute in chennai
It is nice blog Thank you porovide importent information and i am searching for same information to save my timeHadoop Administration Online Course Hyderabad
ReplyDeleteReally nice blog post.provided a helpful information.I hope that you will post more updates like thisHadoop Admin Online Training
ReplyDeleteI personally use them exclusively high-quality elements : you will notice these folks during: review of Pepperstone.com
ReplyDeleteThanks for sharing useful information. I learned something new from your bog. Its very interesting and informative. keep updating. If you are looking for any Hadoop admin related information, please visit our website Hadoop administration training institute in Bangalore
ReplyDelete