A Beginners introduction to Kafka

1 Comments

Nowadays Kafka is the word which is used by most of the people in Message Oriented Middleware industry. I was introduced to Kafka by my colleague as it may be one of the big player amongst Message Brokers in the cloud era.

What is Kafka?

So as per Kafka's standard definition, it is an open source stream processing software by Apache Software Foundation.

If it is an another message broker, how it differs from the market leaders?

Kafka is a stream-processing software platform.

Ok. What is the meaning of stream-processing? 

Stream processing means, real-time processing of data, continuously, concurrently, and in a record-by-record fashion. Real time processing in Kafka means, it can get the data from topic (source), doing some analysis or processing the data and publish it in another topic (consumer). It is achieved using 'kafka stream', a client library which can process/analyze the data stored in Kafka.

How Kafka differs from traditional Message brokers?

Traditional message brokers handle a message in a way that it can be consumed only once. Once it is consumed, the message won't be available.

In Kafka, the messages are stored in a file system and will be available for a period of time as per the retention policy even though they are consumed already.

Then how it works on message processing?

Some important terminologies before workflow of Kafka,

Kafka cluster: Made up of multiple Kafka brokers(servers)

Topic:  A category or feed name to which the records are published.

Partition: In each topic, cluster will maintain partition log (which is a division of topic and will be same in all other servers in the cluster for same topic).

A topic may have multiple partitions and the partitions are replicated across the brokers. Each of the partition contains messages/records in immutable ordered sequence.

Workflow:

  1. Producer will send the message to a partition in a topic and it will be replicated across all brokers for fault tolerance. Producer can send the message to same by mentioning a key value.
  2. The broker, in which the source write a message in partition acts as leader and replicates the message to the same partitions in all other brokers. Zoo-Keeper is used to monitor the leader and elect a new leader if the leader is down.
  3. The messages are appended in the partitions in same sequence and an offset number will be created for each message.
  4. When a consumer registered to a topic, the offset number is shared to the consumer and  consumer can start consuming the new message upon arrival.
  5. Upon each and every message consumption by consumer, the offset advances linearly along the partition log and the same will be kept in zoo keeper. In-case of connection drop issues, zoo-keeper will notify consumer on the last successful offset number on re-connection.
  6. Unlike traditional brokers, the consumer can skip/rewind to desired offset and consume the messages.

So how to avoid duplication?

Using offset value, the consumer will consume the message. Consumer and Zoo-Keeper will know a consumer process current offset value. If the connection  terminates, zoo-keeper will keep track of current offset value for a consumer and it will be shared to consumer once the connection is established again.

How point-to-point and pub-sub is achieved in Kafka?

The consumers register themselves in a group called consumer group and consumer group subscribe to a topic. When a message published to a topic, it is delivered to one consumer in a consumer group.

At any point of time, the total number of consumers in a consumer group should not exceed the total number of partitions, else some consumers in that group may remain idle until existing consumer dies.

If there is a Kafka cluster contains two brokers, each one of them has two partitions and one consumer group with two consumers then all the consumers in that group will have two connections from the partitions.

If there is a consumer group with four consumers then each one will have one connection. If any new consumer added, it has to wait for some other consumer in the same group to exit.

Point-to-point:

The consumer group allows you to divide up processing over a collection of processes (the members of the consumer group).

Pub-Sub:

Kafka allows you to broadcast messages to multiple consumer groups.

Means with single existing architecture, it can support both of them. It is achieved by dividing up the processes within a consumer group (point-to-point), in the same time broadcasting the same records to multiple consumer groups from a broker(pub-sub).

As a beginner, I wanted to share some questions which I searched about Kafka initially and hope to write another story with some deep learning in Kafka.



You may also like

1 comment:

  1. The King Casino - Atlantic City, NJ | Jancasino
    Come sol.edu.kg on in the 출장안마 King Casino for fun, no novcasino wagering requirements, delicious dining, and enjoyable https://jancasino.com/review/merit-casino/ casino gaming all at the heart of Atlantic City. https://septcasino.com/review/merit-casino/

    ReplyDelete

Search This Blog

Powered by Blogger.

Blog Archive