How to Migrate Apache Kafka Applications to Apache Pulsar

By Splunk

Evaluating a next-generation distributed pub-sub system can be a time-consuming affair. The Apache Pulsar community has been working on simplifying the process of installing and deploying Pulsar to make sure that it's easier to experiment with and evaluate Pulsar as well as migrating existing applications from one pub-sub system to another. The Apache Pulsar community introduced a Kafka API-compatible Java library beginning with Pulsar version 1.20.0, which was released in October 2017.

The Kafka API-compatible Pulsar Client

The Kafka API-compatible Pulsar client for Java is comatible with Kafka 0.10.2.1. This Pulsar client library is available at Maven Central and can be installed using Maven, Gradle, and other build tools.

If you're using Maven, you can replace the Kafka client dependency with the following Pulsar client dependency:

<dependency>
    <groupId>org.apache.pulsar</groupId>
    <artifactId>pulsar-client-kafka</artifactId>
    <version>1.21.0-incubating</version>
</dependency>

If you're using Gradle, you can replace the Kafka client dependency with the following Pulsar client dependency:

dependencies {
    compile "org.apache.pulsar:pulsar-client-kafka:1.21.0-incubating"
}

Migration Example: A Log Collection Pipeline

A very common use case for Apache Kafka is as a log collection pipeline. A log collection pipeline is illustrated below:

In this diagram:

Applications → Kafka --- Logs are sent from web servers, applications, and various systems and published to Kafka topics. This can be done in many ways. In the example diagram above, we're using a Kafka SLF4J appender to append logs to Kafka topics.
Kafka → Elasticsearch --- Logs are read from Kafka topics and sent to Elasticsearch for building indexes. In this example, we're using Kafka Connect to connect Kafka topics and Elasticsearch indexes.

Through the remainder of this blog post, I'll show you how to migrate your Kafka applications---such as the log collection pipeline illustrated above---to Pulsar. The demo consists of two parts:

Modifying the application so that it sends log data to Apache Pulsar by using the pulsar-client-kafka library.
Using Kafka Connect to collect log data from Apache Pulsar, rather than from Kafka, by replacing the Kafka client's jar with the pulsar-client-kafka jar.

You can follow the steps below or watch this video:

Migration Instructions

All of the repos used in this tutorial are available on GitHub:

The Kafka logger demo --- A logger application to simulate applications sending log data to Kafka.
Kafka Connect Elasticsearch Connector

Migrate the Kafka Logger

The Kafka logger demo application is available on GitHub:

$ git clone https://github.com/merlimat/kafka-logger-demo

Initially, this Kafka logger demo application is configured to append log data to Kafka topics.

The pom.xml configuration file, for example, is configured to append log data to Kafka topics.

<dependencies>
    <dependency>
        <groupId>org.slf4j</groupId>
        <artifactId>slf4j-api</artifactId>
        <version>1.7.24</version>
    </dependency>

    <dependency>
        <groupId>org.apache.logging.log4j</groupId>
        <artifactId>log4j-core</artifactId>
        <version>2.9.1</version>
    </dependency>

    <dependency>
        <groupId>org.apache.logging.log4j</groupId>
        <artifactId>log4j-slf4j-impl</artifactId>
        <version>2.9.1</version>
    </dependency>

    <dependency>
        <groupId>org.apache.kafka</groupId>
        <artifactId>kafka-clients</artifactId>
        <version>1.0.0</version>
    </dependency>
</dependencies>

In addition, in src/main/resources/log4j2.xml, the application is configured to use a Kafka appender:

<Configuration status="INFO">
    <Appenders>
        <Console name="Console" target="SYSTEM_OUT">
            <PatternLayout pattern="%d{HH:mm:ss.SSS} [%t] %-5level %logger{36} - %msg%n" />
        </Console>
        <Kafka name="Kafka" topic="log-test">
            <JsonLayout />
            <Property name="bootstrap.servers">localhost:9092</Property>
        </Kafka>
    </Appenders>
    <Loggers>
        <Root level="info">
            <AppenderRef ref="Console" />
        </Root>
        <Logger name="demo" level="info">
            <AppenderRef ref="Kafka" />
        </Logger>
    </Loggers>
</Configuration>

Migrating the Kafka logger application is pretty simple. You can change the pom.xml dependency to the pulsar-client-kafka library and configure log4j2 to point to your Pulsar cluster. Here's the updated pom.xml file:

<dependencies>
    <dependency>
        <groupId>org.slf4j</groupId>
        <artifactId>slf4j-api</artifactId>
        <version>1.7.24</version>
    </dependency>
    <dependency>
        <groupId>org.apache.logging.log4j</groupId>
        <artifactId>log4j-core</artifactId>
        <version>2.9.1</version>
    </dependency>
    <dependency>
        <groupId>org.apache.logging.log4j</groupId>
        <artifactId>log4j-slf4j-impl</artifactId>
        <version>2.9.1</version>
    </dependency>
    <dependency>
        <groupId>org.apache.pulsar</groupId>
        <artifactId>pulsar-client-kafka</artifactId>
        <version>1.21.0-incubating</version>
    </dependency>
</dependencies>

And here's the updated log4n2.xml file:

<Configuration status="INFO">
    <Appenders>
        <Console name="Console" target="SYSTEM_OUT">
            <PatternLayout pattern="%d{HH:mm:ss.SSS} [%t] %-5level %logger{36} - %msg%n" />
        </Console>
        <Kafka name="Kafka" topic="persistent://sample/standalone/ns1/log-test">
            <JsonLayout />
            <Property name="bootstrap.servers">pulsar://localhost:6650</Property>
        </Kafka>
    </Appenders>
    <Loggers>
        <Root level="info">
            <AppenderRef ref="Console" />
        </Root>
        <Logger name="demo" level="info">
            <AppenderRef ref="Kafka" />
        </Logger>
    </Loggers>
</Configuration>

Migrate Kafka Connect

The Kafka Connect Elasticsearch connector is available on GitHub:

$ git clone https://github.com/merlimat/kafka-connect-elasticsearch/tree/0.10.0.0

Similar to migrating the Kafka logger application, you'll need to:

Delete the kafka-clients library from your Kafka connect lib directory and add the pulsar-client-kafka library instead.

Update your Kafka Connect configuration to point the bootstrap.servers to your Pulsar cluster

# Comment out Kafka bootstrap servers
# bootstrap.servers=localhost:9092

# Add Pulsar bootstrap servers
bootstrap.servers=pulsar://localhost:6650

Update your kafka-elasticsearch configuration to read from Pulsar topics instead of from Kafka topics

# Comment out Kafka topic
# topics=log-test

# Add Pulsar topics
topics=persistent://sample/standalone/ns1/log-test
topic.index.map=persistent://sample/standalone/ns1/log-test:log-test

More on Apache Pulsar

If you want to learn more about Apache Pulsar, please visit the official website at https://pulsar.apache.org.

If you want to learn more about the differences between Apache Pulsar and Apache Kafka, please check out our series of comparison blog posts:

You can also participate in the Pulsar community via:

The Pulsar slack channel. You can self-register at https://apache-pulsar.herokuapp.com. The Pulsar email list.

For getting the latest updates about Pulsar, you can follow the projects on Twitter @apache_pulsar.

Thanks,
Sijie Guo

Splunk

The world’s leading organizations trust Splunk to help keep their digital systems secure and reliable. Our software solutions and services help to prevent major issues, absorb shocks and accelerate transformation. Learn what Splunk does and why customers choose Splunk.

About Splunk

The world’s leading organizations rely on Splunk, a Cisco company, to continuously strengthen digital resilience with our unified security and observability platform, powered by industry-leading AI.

Our customers trust Splunk’s award-winning security and observability solutions to secure and improve the reliability of their complex digital environments, at any scale.