Apache Big Data Seville 2016 – Hadoop, Hive, Spark and Object Stores – Steve Loughran

Hadoop, Hive, Spark and Object Stores – Steve Loughran

Cloud deployments of Apache Hadoop are becoming more commonplace. Yet Hadoop and it’s applications don’t integrate that well äóîsomething which starts right down at the file IO operations.

This talk looks at how to make use of cloud object stores in Hadoop applications, including Hive and Spark. It will go from the foundational “what’s an object store?” to the practical “what should I avoid” and the timely “what’s new in Hadoop?” äóî the latter covering the improved S3 support in Hadoop 2.8+.

I’ll explore the details of benchmarking and improving object store IO in Hive and Spark, showing what developers can do in order to gain performance improvements in their own code äóîand equally, what they must avoid.

Finally, I’ll look at ongoing work, especially “S3Guard” and what its fast and consistent file metadata operations promise.

More information about this talk

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s