In previous post, we saw an intro about Apache Kafka. Not like traditional messaging softwares, Kafka is used in variety of use cases. We will see some of them in this post.
Messaging:
Usage of Kafka as a message system has been discussed in the previous post.
Message brokers often act as a middle layer, which is used for various reasons including decoupling both source and destination, Asyncronous message processing, multiple destination routing.
Kafka supports both point-to-point and pub-sub methods by implementing the consumer groups concept. We will see some comparison between Kafka and other messaging systems.
Protocol:
Kafka - Uses a binary protocol which defines all APIs as request response message pairs
RabbitMQ - AMQ protocol
IBM MQ - Supports JMS, MQI
Model:
Kafka - dumb broker / smart consumer model
RabbitMQ - smart broker / dumb consumer model
IBM MQ - smart broker / dump consumer model (depending upon my understanding on smart broker / dump consumer)
License:
Kafka - Open source through Apache License 2.0
RabbitMQ - open Source through Mozilla Public License
IBM MQ - Properiatory software
Client Libraries:
IBM MQ - Using MQ client for message transfer. Also Supports NET, ActiveX, C++, Java™, JMS, REST APIs. using MQ lite we can also connect to Node.js, Java, Ruby, Python
Benchmark:
Apache - 821557 recoreds/s (Single producer thread with no replication) . More detailed results are in here.
RabbitMQ - 53710m/s (producer only with no consumer). More detailed results are in here.
IBM MQ - Please see the performance results of IBM MQ in here.
Web Acitivity tracking:
Web Activity Tracking means, tracking a user activity, page views, searches which is used in web analytics for optimized user experience. Web Activity Tracking is often very high in volume, as many activity messages are generated for each user page view.
Web Activity Tracking needs a system which can transfer millions of data per second in order to provide results related to user's need.
Below are some of the statistics provided by Kafka user's,
LinkedIn - https://engineering.linkedin.com/kafka/running-kafka-scale (blog of Senior Staff Engineer in LinkedIn)
"When combined, the Kafka ecosystem at LinkedIn is sent over 800 billion messages per day which amounts to over 175 terabytes of data. Over 650 terabytes of messages are then consumed daily, which is why the ability of Kafka to handle multiple producers and multiple consumers for each topic is important. At the busiest times of day, we are receiving over 13 million messages per second, or 2.75 gigabytes of data per second. To handle all these messages, LinkedIn runs over 1100 Kafka brokers organized into more than 60 clusters."
Yahoo - https://yahooeng.tumblr.com/post/109994930921/kafka-yahoo
"Kafka is used by many teams across Yahoo. The Media Analytics team uses Kafka in our real-time analytics pipeline. Our Kafka cluster handles a peak bandwidth of more than 20Gbps (of compressed data)."
"Tumblr started as a fairly typical large LAMP application. The direction they are moving in now is towards a distributed services model built around Scala, HBase, Redis, Kafka, Finagle, and an intriguing cell based architecture for powering their Dashboard. Effort is now going into fixing short term problems in their PHP application, pulling things out, and doing it right using services."
We will see the other usecases in the next post.
Disclaimer:
All the statistics and benchmark results are taken from the respective websites provided in the links. None of them are conducted or tested by me and also not owned by me. The test results and data published in the company websites are property of the respective companies. Please refer the website links for more details.
There may be difference in the data given in the websites (which may have altered, so please refer the direct websites for more detailed data)