FeatherCast

The voice of The Apache Software Foundation

(Apache) Drill-ing into Collections of HDF5 Files Hyo-Kyung Joe Lee

September 12, 2019
timothyarthur

This talk is about building bridges between two ecosystems, Apache and HDF5.nHDF5 is a widely used storage container for complex data from embedded devicesnto supercomputers, and is developed and maintained as FOSS by The HDF Group.nWhile it is easy to imagine the potential benefits of making HDF5 containersnaccessible from the various Apache frameworks, there are several technicalnchallenges to overcome, and what makes a ‘good’ integration is by no meansnobvious. It requires the input and collaboration of experts from both sides. The core of this presentation will be an overview of a joint project between The HDF Group and Apache Drill contributor Charles Givre. We will show how tonuse Apache Drill to explore collections of HDF5 containers and discuss thenunderlying design decisions and a few technicalities. We will use thisnopportunity to give the Apache community an update on the support forncollections of HDF5 files on HDFS, in cloud storage (such as S3), our Sparkndata source and our new ‘HDF5 in the cloud’ platform, HDF KITA. The intended audience are ‘bridge-builders’ and curious members of the Apachencommunity, and anyone interested in ‘HDF5 demystified.’

Leave a Reply

Powered by WordPress.com.
%d bloggers like this: