KafkaStreams API "through" operation behavior
In this post we are taking a deeper look into KafkaStreams API's "through" operation behavior and potential problems if it is not used carefully.
Some time ago I was writing a streaming application using KafkaStreams API and it supposed to do something like this:
1. Read message from "input topic"
2. Apply first transformation
3. Output transformed message to some "side topic"
4. Apply second transformation
5. Output message to "final topic"
KStream.through() operation seems logical choice for 3. step - outputting message to side topic and continue to process the message.
But if you haven't read this method's javadoc carefully (like I wasn't), you might be surprised with it's behavior.
I expected that "through" method will only produce the message to side-topic and continue with following operations on the message.
But instead it outputs the message to specified topic and returns new KStream built from that topic.
The "problem" with this implementation is that if "someone else" (another streaming app, or another producer) writes messages to this "side topic", you will start getting output from "through" without ever "calling" it from your streaming application.
Bottom line, when using "through" operation you must be completely aware of how is target topic used in other places in your application and even in some external applications if they have access to this topic.
I wrote small app that demonstrates this behavior. You can find it at our github
For our project this behavior was "show stopper", and we switched to using KafkaProducer for writing message to side topic.
You can still accomplish a "side output topic"
ReplyDeleteKStream stream = builder.stream("input-topic"):
stream.to("side-output-topic"); // former "through()"
stream.filter(); // filter() just an example
Re-using `stream` twice will "broadcast" (without actual copying) each record into both downstream operators: in the example to() and filter().
Thanks Matthias, this is definitely more elegant then using KafkaProducer.
DeleteI have read your blog its very attractive and impressive. I like it your blog.
ReplyDeletebusinessexceltemplates
Article submission sites