FeatherCast

The voice of The Apache Software Foundation

Apache Big Data Seville 2016 – Real Time Aggregation with Kafka, Spark Streaming and ElasticSearch, Scalable Beyond Million RPS – Dibyendu Bhattacharya

December 5, 2016
asfinfra

Real Time Aggregation with Kafka, Spark Streaming and ElasticSearch, Scalable Beyond Million RPS – Dibyendu Bhattacharya

While building a massively scalable real time pipeline to collect transaction logs from network traffic, one of the major challenges was performing aggregation on streaming data on the fly. This was needed to compute multiple metrics across various dimensions which help our customer to see near real time views of application delivery and performance. In this talk, learn how we designed our real time pipeline for doing multi-stage aggregation powered by Kafka ,Spark Streaming and ElasticSearch. At InstartLogic we used custom Spark Receiver for Kafka which is used in first stage aggregation. The second stage includes Spark Streaming driven aggregation within given batch window . Final stage aggregation involves custom ElasticSearch plugins to aggregate across Batches. I will cover this multi-stage aggregation,including optimisation across all stages which is scalable beyond million RPS

More information about this talk

Leave a Reply

Powered by WordPress.com.
%d bloggers like this: