Apache Big Data Seville 2016 – Real Time Aggregation with Kafka, Spark Streaming and ElasticSearch, Scalable Beyond Million RPS – Dibyendu Bhattacharya

Real Time Aggregation with Kafka, Spark Streaming and ElasticSearch, Scalable Beyond Million RPS – Dibyendu Bhattacharya

While building a massively scalable real time pipeline to collect transaction logs from network traffic, one of the major challenges was performing aggregation on streaming data on the fly. This was needed to compute multiple metrics across various dimensions which help our customer to see near real time views of application delivery and performance. In this talk, learn how we designed our real time pipeline for doing multi-stage aggregation powered by Kafka ,Spark Streaming and ElasticSearch. At InstartLogic we used custom Spark Receiver for Kafka which is used in first stage aggregation. The second stage includes Spark Streaming driven aggregation within given batch window . Final stage aggregation involves custom ElasticSearch plugins to aggregate across Batches. I will cover this multi-stage aggregation,including optimisation across all stages which is scalable beyond million RPS

More information about this talk

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s