KafkaStreams API "through" operation behavior



In this post we are  taking a deeper look into KafkaStreams API's "through" operation behavior and  potential problems if it is not used carefully.

Some time ago I was writing a streaming application using KafkaStreams API and it supposed to do something like this:

1. Read message from "input topic"
2. Apply first transformation
3. Output transformed message to some "side topic"
4. Apply second transformation
5. Output message to "final topic"

KStream.through() operation seems logical choice for 3. step - outputting message to side topic and continue to process the message.

But if you haven't read this method's javadoc carefully (like I wasn't), you might be surprised with it's behavior.

I expected that "through" method will only produce the message to side-topic and continue with following operations on the message.
But instead it outputs the message to specified topic and returns new KStream built from that topic.

The "problem" with this implementation is that if "someone else" (another streaming app, or another producer) writes messages to this "side topic", you will start getting output from "through" without ever "calling" it from your streaming application.

Bottom line, when using "through" operation you must be completely aware of how is target topic used in other places in your application and even in some external applications if they have access to this topic.

I wrote small app that demonstrates this behavior. You can find it at our github

For our project this behavior was "show stopper", and we switched to using KafkaProducer for writing  message to side topic.

Comments

  1. You can still accomplish a "side output topic"

    KStream stream = builder.stream("input-topic"):
    stream.to("side-output-topic"); // former "through()"
    stream.filter(); // filter() just an example

    Re-using `stream` twice will "broadcast" (without actual copying) each record into both downstream operators: in the example to() and filter().

    ReplyDelete
    Replies
    1. Thanks Matthias, this is definitely more elegant then using KafkaProducer.

      Delete

Post a Comment

Popular posts from this blog

Making of Message Gateway with Kafka - Part 3

Making of Message Gateway with Kafka - Part 2

Making of Message Gateway with Kafka - Part 1