Evaluating a next-generation distributed pub-sub system can be a time-consuming affair. The Apache Pulsar community has been working on simplifying the process of installing and deploying Pulsar to make sure that it's easier to experiment with and evaluate Pulsar as well as migrating existing applications from one pub-sub system to another. The Apache Pulsar community introduced a Kafka API-compatible Java library beginning with Pulsar version 1.20.0, which was released in October 2017.
The Kafka API-compatible Pulsar client for Java is comatible with Kafka 0.10.2.1. This Pulsar client library is available at Maven Central and can be installed using Maven, Gradle, and other build tools.
If you're using Maven, you can replace the Kafka client dependency with the following Pulsar client dependency:
<dependency> <groupId>org.apache.pulsar</groupId> <artifactId>pulsar-client-kafka</artifactId> <version>1.21.0-incubating</version> </dependency>
If you're using Gradle, you can replace the Kafka client dependency with the following Pulsar client dependency:
dependencies { compile "org.apache.pulsar:pulsar-client-kafka:1.21.0-incubating" }
A very common use case for Apache Kafka is as a log collection pipeline. A log collection pipeline is illustrated below:
In this diagram:
Through the remainder of this blog post, I'll show you how to migrate your Kafka applications---such as the log collection pipeline illustrated above---to Pulsar. The demo consists of two parts:
You can follow the steps below or watch this video:
All of the repos used in this tutorial are available on GitHub:
The Kafka logger demo application is available on GitHub:
$ git clone https://github.com/merlimat/kafka-logger-demo
Initially, this Kafka logger demo application is configured to append log data to Kafka topics.
The pom.xml configuration file, for example, is configured to append log data to Kafka topics.
<dependencies> <dependency> <groupId>org.slf4j</groupId> <artifactId>slf4j-api</artifactId> <version>1.7.24</version> </dependency> <dependency> <groupId>org.apache.logging.log4j</groupId> <artifactId>log4j-core</artifactId> <version>2.9.1</version> </dependency> <dependency> <groupId>org.apache.logging.log4j</groupId> <artifactId>log4j-slf4j-impl</artifactId> <version>2.9.1</version> </dependency> <dependency> <groupId>org.apache.kafka</groupId> <artifactId>kafka-clients</artifactId> <version>1.0.0</version> </dependency> </dependencies>
In addition, in src/main/resources/log4j2.xml, the application is configured to use a Kafka appender:
<Configuration status="INFO"> <Appenders> <Console name="Console" target="SYSTEM_OUT"> <PatternLayout pattern="%d{HH:mm:ss.SSS} [%t] %-5level %logger{36} - %msg%n" /> </Console> <Kafka name="Kafka" topic="log-test"> <JsonLayout /> <Property name="bootstrap.servers">localhost:9092</Property> </Kafka> </Appenders> <Loggers> <Root level="info"> <AppenderRef ref="Console" /> </Root> <Logger name="demo" level="info"> <AppenderRef ref="Kafka" /> </Logger> </Loggers> </Configuration>
Migrating the Kafka logger application is pretty simple. You can change the pom.xml dependency to the pulsar-client-kafka library and configure log4j2 to point to your Pulsar cluster. Here's the updated pom.xml file:
<dependencies> <dependency> <groupId>org.slf4j</groupId> <artifactId>slf4j-api</artifactId> <version>1.7.24</version> </dependency> <dependency> <groupId>org.apache.logging.log4j</groupId> <artifactId>log4j-core</artifactId> <version>2.9.1</version> </dependency> <dependency> <groupId>org.apache.logging.log4j</groupId> <artifactId>log4j-slf4j-impl</artifactId> <version>2.9.1</version> </dependency> <dependency> <groupId>org.apache.pulsar</groupId> <artifactId>pulsar-client-kafka</artifactId> <version>1.21.0-incubating</version> </dependency> </dependencies>
And here's the updated log4n2.xml file:
<Configuration status="INFO"> <Appenders> <Console name="Console" target="SYSTEM_OUT"> <PatternLayout pattern="%d{HH:mm:ss.SSS} [%t] %-5level %logger{36} - %msg%n" /> </Console> <Kafka name="Kafka" topic="persistent://sample/standalone/ns1/log-test"> <JsonLayout /> <Property name="bootstrap.servers">pulsar://localhost:6650</Property> </Kafka> </Appenders> <Loggers> <Root level="info"> <AppenderRef ref="Console" /> </Root> <Logger name="demo" level="info"> <AppenderRef ref="Kafka" /> </Logger> </Loggers> </Configuration>
The Kafka Connect Elasticsearch connector is available on GitHub:
$ git clone https://github.com/merlimat/kafka-connect-elasticsearch/tree/0.10.0.0
Similar to migrating the Kafka logger application, you'll need to:
Delete the kafka-clients library from your Kafka connect lib directory and add the pulsar-client-kafka library instead.
Update your Kafka Connect configuration to point the bootstrap.servers to your Pulsar cluster
# Comment out Kafka bootstrap servers # bootstrap.servers=localhost:9092 # Add Pulsar bootstrap servers bootstrap.servers=pulsar://localhost:6650
Update your kafka-elasticsearch configuration to read from Pulsar topics instead of from Kafka topics
# Comment out Kafka topic # topics=log-test # Add Pulsar topics topics=persistent://sample/standalone/ns1/log-test topic.index.map=persistent://sample/standalone/ns1/log-test:log-test
If you want to learn more about Apache Pulsar, please visit the official website at https://pulsar.apache.org.
If you want to learn more about the differences between Apache Pulsar and Apache Kafka, please check out our series of comparison blog posts:
You can also participate in the Pulsar community via:
The Pulsar slack channel. You can self-register at https://apache-pulsar.herokuapp.com. The Pulsar email list.
For getting the latest updates about Pulsar, you can follow the projects on Twitter @apache_pulsar.
Thanks,
Sijie Guo
The Splunk platform removes the barriers between data and action, empowering observability, IT and security teams to ensure their organizations are secure, resilient and innovative.
Founded in 2003, Splunk is a global company — with over 7,500 employees, Splunkers have received over 1,020 patents to date and availability in 21 regions around the world — and offers an open, extensible data platform that supports shared data across any environment so that all teams in an organization can get end-to-end visibility, with context, for every interaction and business process. Build a strong data foundation with Splunk.