FeatherCast

The voice of The Apache Software Foundation

Inside Apache Druid: Built for High-Performance Real-Time Analytics Surekha Saharan

September 12, 2019
timothyarthur

Interactive applications are replacing traditional reporting interfaces as the preferred means for organizations to derive value from their datasets. An interactive user experience requires latency on the order of milliseconds. Cluster computing frameworks such as Apache Hadoop or Apache Spark are tremendously beneficial in processing and deriving insights from data. However, high query latency makes these frameworks sub-optimal for interactive applications. Alternatively, the use of relational databases and key/value stores as dedicated query layers can reduce latency, but these approaches suffer many drawbacks for analytic use cases. Apache Druid (incubating) is well suited to power analytic applications working with real-time data and requiring low latency. This talk will cover the current state of analytics world, drawbacks of current popular databases, and what Druid nbrings to the table. The talk will cover technical details behind Druid’s design, including how its query processing layer works and how each component contributes to achieving top performance for analytical queries. We will discuss Druid modern architecture that implements a memory-mappable storage format, indexes, compression, late tuple materialization, and a query engine that can operate directly on compressed data. At the time of ApacheCon, Druid will very likely have been granted status as a top-level ASF project.

Leave a Reply

Powered by WordPress.com.
%d bloggers like this: