Processing Planetary Sized Datasets – Tim Park
In my group at Microsoft, we have worked with the United Nations, Guide Dogs for the Blind in the UK, several automotive companies, and StrĂ_er on a number of projects involving high scale geospatial data.
In this talk, I’ll share some of the best practices and patterns that have come out of those experiences: best practices for storing and indexing geospatial data at scale, incremental ingestion and slice processing of the data, and efficiently building and presenting progressive levels of detail.
The audience will walk away with an understanding of how to efficiently summarize data over a geographic area, general methods for doing ingestion with Apache Kafka (or other event ingestion systems), and incremental updates to large scale datasets with Apache Spark, and best practices around visualizing this data on the frontend.
Speakers
Tim Park