Apache Kafka: Creating Real-Time Distributed Microservices

January 31, 2019January 30, 2019 Bill Doerrfeld apache, code, Kafka, microservices, open source

Apache Kafka is proving its mettle in the burgeoning microservices space

Research values the global microservices industry at $1.8 billion, and microservices deployments continue to rise across the board. With that sort of interest, there is now a great collection of frameworks, libraries and other packages designed to help construct elegant, reactive services.

One of which is Kafka. Under the Apache project umbrella, Apache Kafka is a unique proposition for constructing microservices that require real-time streaming capabilities.

Described as a “distributed streaming platform,” Kafka meshes the old and new, combining the benefits of distributed processing with enterprise systems, making it a contender for a broad range of use case scenarios. In this article, we’ll take a bird’s eye view of building real-time applications that implement Kafka Streams.

Apache Kafka

With user demand calling for real-time applications, developers are seeking innovative methods for implementing it. If streaming is the rage, Kafka is of the moment.

Sponsorships Available

Streaming is beneficial for architecting pipelines that react to events, and the client library Kafka Streams can be utilized for constructing such microservices.

Kafka can be leveraged to deploy to containers, cloud, bare-metal or virtual machines. Data storage must be in Kafka clusters, making it a great match for Java and Scala applications.

Kafka Sample App

To see what Kafka looks like in practice, take this sample application – WordCount – as demonstrated on the Kafka streams documentation. This happens to be written in Scala:

import java.util.Properties
import java.util.concurrent.TimeUnit
 
import org.apache.kafka.streams.kstream.Materialized
import org.apache.kafka.streams.scala.ImplicitConversions._
import org.apache.kafka.streams.scala._
import org.apache.kafka.streams.scala.kstream._
import org.apache.kafka.streams.{KafkaStreams, StreamsConfig}
 
object WordCountApplication extends App {
  import Serdes._
 
  val props: Properties = {
    val p = new Properties()
    p.put(StreamsConfig.APPLICATION_ID_CONFIG, "wordcount-application")
    p.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "kafka-broker1:9092")
    p
  }
 
  val builder: StreamsBuilder = new StreamsBuilder
  val textLines: KStream[String, String] = builder.stream[String, String]("TextLinesTopic")
  val wordCounts: KTable[String, Long] = textLines
    .flatMapValues(textLine => textLine.toLowerCase.split("\\W+"))
    .groupBy((_, word) => word)
    .count(Materialized.as("counts-store"))
  wordCounts.toStream.to("WordsWithCountsTopic")
 
  val streams: KafkaStreams = new KafkaStreams(builder.build(), props)
  streams.start()
 
  sys.ShutdownHookThread {
     streams.close(10, TimeUnit.SECONDS)
  }
}

Features Overview

Kafka can be condensed into three main areas: a publish/subscribe record of streams; storage of streams; and processing of streams as they come in. Scalable for both small and large deployments, these traits make Kafka relevant for real-time applications that need to transmit data between systems as well as for systems that react to such data.

Kafka is comprised of four main APIs: Producer, Consumer, Streams and Connector. Let’s further explore the Streams API, which, by Apache’s definition, “allows building applications that do non-trivial processing that compute aggregations off of streams or join streams together.”

Using Kafka Streams

Peruse the documentation here to discover how to interact with Streams in detail. To get started with Kafka Streams, first, download Scala. They recommend 2.11:

> tar -xzf kafka_2.11-2.0.0.tgz
> cd kafka_2.11-2.0.0

Kafka offers a script to spin up a Zookeeper server, the default server for Kafka implementations:

> bin/zookeeper-server-start.sh config/zookeeper.properties
[2013-04-22 15:01:37,495] INFO Reading configuration from: config/zookeeper.properties (org.apache.zookeeper.server.quorum.QuorumPeerConfig)
...

After you create input and output topics, an application can be initiated. This example command starts the WordCountDemo snippet found earlier in this piece:

> bin/kafka-run-class.sh org.apache.kafka.streams.examples.wordcount.WordCountDemo

The tutorial uses a Kafka Streams Maven Archetype for creating Kafka applications:

mvn archetype:generate \
    -DarchetypeGroupId=org.apache.kafka \
    -DarchetypeArtifactId=streams-quickstart-java \
    -DarchetypeVersion=2.0.0 \
    -DgroupId=streams.examples \
    -DartifactId=streams.examples \
    -Dversion=0.1 \
    -Dpackage=myapps

With the above commands, in mind, the Stream tutorial also shares a possible project structure, with the following example:

> tree streams.examples
streams-quickstart
|-- pom.xml
|-- src
    |-- main
        |-- java
        |   |-- myapps
        |       |-- LineSplit.java
        |       |-- Pipe.java
        |       |-- WordCount.java
        |-- resources
            |-- log4j.properties

Benefits of Apache Kafka

Kafka can act as an intersection between applications to enable publishing, subscription and stream-processing capabilities. With a distributed nature by design, Kafka has a configurable Leader and Follower relationship between servers.

If you require real-time applications to transmit data between systems or need to act upon real-time data in some fashion, Kaftka is a quality bet. Proof lies in its wide base of adoption, including the likes of New York Times, Pinterest, Zalando, LINE, Trivago and others.

Distributed Microservices with Kafka Streams

We’ve briefly highlighted Kafka Streams, a component of open source Apache Kafka, and its use in building out real-time, distributed microservices. Enabling you to store static files for batch processing as well as process future messages by subscription, the open source Kafka combines the benefit of distributed files systems and traditional files systems.