82 lines
5.8 KiB
HTML
82 lines
5.8 KiB
HTML
<!--
|
|
Licensed to the Apache Software Foundation (ASF) under one or more
|
|
contributor license agreements. See the NOTICE file distributed with
|
|
this work for additional information regarding copyright ownership.
|
|
The ASF licenses this file to You under the Apache License, Version 2.0
|
|
(the "License"); you may not use this file except in compliance with
|
|
the License. You may obtain a copy of the License at
|
|
|
|
http://www.apache.org/licenses/LICENSE-2.0
|
|
|
|
Unless required by applicable law or agreed to in writing, software
|
|
distributed under the License is distributed on an "AS IS" BASIS,
|
|
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
|
See the License for the specific language governing permissions and
|
|
limitations under the License.
|
|
-->
|
|
|
|
<p> Here is a description of a few of the popular use cases for Apache Kafka®.
|
|
For an overview of a number of these areas in action, see <a href="https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying/">this blog post</a>. </p>
|
|
|
|
<h4 class="anchor-heading"><a id="uses_messaging" class="anchor-link"></a><a href="#uses_messaging">Messaging</a></h4>
|
|
|
|
Kafka works well as a replacement for a more traditional message broker.
|
|
Message brokers are used for a variety of reasons (to decouple processing from data producers, to buffer unprocessed messages, etc).
|
|
In comparison to most messaging systems Kafka has better throughput, built-in partitioning, replication, and fault-tolerance which makes it a good
|
|
solution for large scale message processing applications.
|
|
<p>
|
|
In our experience messaging uses are often comparatively low-throughput, but may require low end-to-end latency and often depend on the strong
|
|
durability guarantees Kafka provides.
|
|
<p>
|
|
In this domain Kafka is comparable to traditional messaging systems such as <a href="http://activemq.apache.org">ActiveMQ</a> or
|
|
<a href="https://www.rabbitmq.com">RabbitMQ</a>.
|
|
|
|
<h4 class="anchor-heading"><a id="uses_website" class="anchor-link"></a><a href="#uses_website">Website Activity Tracking</a></h4>
|
|
|
|
The original use case for Kafka was to be able to rebuild a user activity tracking pipeline as a set of real-time publish-subscribe feeds.
|
|
This means site activity (page views, searches, or other actions users may take) is published to central topics with one topic per activity type.
|
|
These feeds are available for subscription for a range of use cases including real-time processing, real-time monitoring, and loading into Hadoop or
|
|
offline data warehousing systems for offline processing and reporting.
|
|
<p>
|
|
Activity tracking is often very high volume as many activity messages are generated for each user page view.
|
|
|
|
<h4 class="anchor-heading"><a id="uses_metrics" class="anchor-link"></a><a href="#uses_metrics">Metrics</a></h4>
|
|
|
|
Kafka is often used for operational monitoring data.
|
|
This involves aggregating statistics from distributed applications to produce centralized feeds of operational data.
|
|
|
|
<h4 class="anchor-heading"><a id="uses_logs" class="anchor-link"></a><a href="#uses_logs">Log Aggregation</a></h4>
|
|
|
|
Many people use Kafka as a replacement for a log aggregation solution.
|
|
Log aggregation typically collects physical log files off servers and puts them in a central place (a file server or HDFS perhaps) for processing.
|
|
Kafka abstracts away the details of files and gives a cleaner abstraction of log or event data as a stream of messages.
|
|
This allows for lower-latency processing and easier support for multiple data sources and distributed data consumption.
|
|
|
|
In comparison to log-centric systems like Scribe or Flume, Kafka offers equally good performance, stronger durability guarantees due to replication,
|
|
and much lower end-to-end latency.
|
|
|
|
<h4 class="anchor-heading"><a id="uses_streamprocessing" class="anchor-link"></a><a href="#uses_streamprocessing">Stream Processing</a></h4>
|
|
|
|
Many users of Kafka process data in processing pipelines consisting of multiple stages, where raw input data is consumed from Kafka topics and then
|
|
aggregated, enriched, or otherwise transformed into new topics for further consumption or follow-up processing.
|
|
For example, a processing pipeline for recommending news articles might crawl article content from RSS feeds and publish it to an "articles" topic;
|
|
further processing might normalize or deduplicate this content and publish the cleansed article content to a new topic;
|
|
a final processing stage might attempt to recommend this content to users.
|
|
Such processing pipelines create graphs of real-time data flows based on the individual topics.
|
|
Starting in 0.10.0.0, a light-weight but powerful stream processing library called <a href="/documentation/streams">Kafka Streams</a>
|
|
is available in Apache Kafka to perform such data processing as described above.
|
|
Apart from Kafka Streams, alternative open source stream processing tools include <a href="https://storm.apache.org/">Apache Storm</a> and
|
|
<a href="http://samza.apache.org/">Apache Samza</a>.
|
|
|
|
<h4 class="anchor-heading"><a id="uses_eventsourcing" class="anchor-link"></a><a href="#uses_eventsourcing">Event Sourcing</a></h4>
|
|
|
|
<a href="http://martinfowler.com/eaaDev/EventSourcing.html">Event sourcing</a> is a style of application design where state changes are logged as a
|
|
time-ordered sequence of records. Kafka's support for very large stored log data makes it an excellent backend for an application built in this style.
|
|
|
|
<h4 class="anchor-heading"><a id="uses_commitlog" class="anchor-link"></a><a href="#uses_commitlog">Commit Log</a></h4>
|
|
|
|
Kafka can serve as a kind of external commit-log for a distributed system. The log helps replicate data between nodes and acts as a re-syncing
|
|
mechanism for failed nodes to restore their data.
|
|
The <a href="/documentation.html#compaction">log compaction</a> feature in Kafka helps support this usage.
|
|
In this usage Kafka is similar to <a href="https://bookkeeper.apache.org/">Apache BookKeeper</a> project.
|