FeatherCast

The voice of The Apache Software Foundation

Apache Big Data Seville 2016 – Large Scale Open Source Data Processing Pipelines at Trivago – Clemens Valiente

January 12, 2017
rbowen

Large Scale Open Source Data Processing Pipelines at Trivago – Clemens Valiente

trivago is processing roughly 7 billion events per day with an architecture that is entirely open source – from producing the data until its visualization in dashboards and reports. This talk will explain the idea behind the pipeline, highlight a particular business use case and share the experience and engineering challenges from two years in production. Clemens Valiente will furthermore show the different tools, frameworks and systems used, with Kafka for data ingestion, hadoop and Hive for processing and Impala for querying as the main focus. The successful implementation of this large scale data processing pipeline fundamentally transformed the way trivago was able to approach its business.

More information about this talk

Leave a Reply

Powered by WordPress.com.
%d bloggers like this: