In my group at Microsoft, we have worked with the United Nations, Guide Dogs for the Blind in the UK, several automotive companies, and Strí_er on a number of projects involving high scale geospatial data.
In this talk, I’ll share some of the best practices and patterns that have come out of those experiences: best practices for storing and indexing geospatial data at scale, incremental ingestion and slice processing of the data, and efficiently building and presenting progressive levels of detail.
The audience will walk away with an understanding of how to efficiently summarize data over a geographic area, general methods for doing ingestion with Apache Kafka (or other event ingestion systems), and incremental updates to large scale datasets with Apache Spark, and best practices around visualizing this data on the frontend.
Geospatial Track: Crowd Learning for Indoor Navigation – Thomas Burgess
indoo.rs enables location based services for indoor applications. With indoo.rs, developers can add new features to their products, including having locations trigger events, track assets, showing closest routes to other places. For this, we use WiFi/beacon radio infrastructure, mobile devices and our cloud which produce lots of geospatial time series data. The real-time indoor navigation fuses independent movement from custom 9D sensor fusion and position estimates obtained by comparing current signal readings to a reference map. This talk will discuss how we create and maintain these maps in our big data machine learning system which leverages crowd data through Kafka and Spark to run SLAM and context aware algorithms to create high quality trajectories. In addition to use in reference maps, these trajectories provide an additional input for our interactive analytics.
Geospatial Track: Geospatial Big Data: Software Architectures and the Role of APIs in Standardized Environments – Ingo Simonis, Open Geospatial Consortium (OGC)
A number of technologies have evolved around big data, in particular products from the Apache community such as Hadoop, Storm, Spark, Hive, or Cassandra. The geospatial community has developed a range of standards to handle geospatial data in an efficient way. Most of these standards are produced by the Open Geospatial Consortium (OGC) and implemented in the form of domain-agnostic data models and Web services. With the emerging demand for streamlined APIs, new questions emerge how access to Big Data in the geospatial community can be handled most efficiently, how existing standards serve these new demands and implementation realities with distributed Big Data repositories operated e.g. by the various space agencies. This presentation should stimulate the discussion of geospatial Big Data handling in standardized environments and explore the role of products from the Apache community.
With help from a number of other people, George Percivall facilitated the creation of the Geospatial track at ApacheCon this year. Here’s George talking about what that entails, and what kind of topics will be covered.
(If the player above doesn’t work for you, you can listen HERE.