FeatherCast

The voice of The Apache Software Foundation

Apache Big Data Seville 2016 – Apache Kudu: A Distributed, Columnar Data Store for Fast Analytics – Mike Percy

January 23, 2017
rbowen

Apache Kudu: A Distributed, Columnar Data Store for Fast Analytics – Mike Percy

The Hadoop ecosystem has recently made great strides in its real-time access capabilities, narrowing the gap compared to traditional database technologies. With systems like Apache Spark, analysts can now run complex queries or jobs over large datasets within a matter of seconds. With systems like Apache HBase, applications can achieve millisecond-scale random access to arbitrarily-sized datasets. However, gaps remain when scans and random access are both required.

This talk will investigate the trade-offs between real-time random access and fast analytic performance from the perspective of storage engine internals. It will also describe Apache Kudu, the new addition to the open source Hadoop ecosystem with out-of-the-box integration with Apache Spark, that fills the gap described above to provide a new option to achieve fast scans and fast random access from a single API.

More information about this talk

Leave a Reply

Powered by WordPress.com.
%d bloggers like this: