The voice of The Apache Software Foundation

Apache Big Data EU 2016: Processing Planetary Sized Datasets – Tim Park

November 23, 2016

Processing Planetary Sized Datasets – Tim Park

In my group at Microsoft, we have worked with the United Nations, Guide Dogs for the Blind in the UK, several automotive companies, and Strí_er on a number of projects involving high scale geospatial data.

In this talk, I’ll share some of the best practices and patterns that have come out of those experiences: best practices for storing and indexing geospatial data at scale, incremental ingestion and slice processing of the data, and efficiently building and presenting progressive levels of detail.

The audience will walk away with an understanding of how to efficiently summarize data over a geographic area, general methods for doing ingestion with Apache Kafka (or other event ingestion systems), and incremental updates to large scale datasets with Apache Spark, and best practices around visualizing this data on the frontend.

Tim Park

More information about this talk

Leave a Reply

Powered by WordPress.com.
%d bloggers like this: