Python is a widely used programming language that is characterized by a low barrier to entry. As many other companies in the industry, Yelp has used Python as the main programming language to implement back-end services. Unfortunately, when it comes to stream processing Python presents several challenges, including performance limitations, lack of proper multi-threading support and limited framework options.
When the use cases for more advanced stream processing started to arise, at Yelp we decided to leverage Flink and introduce a Scala/Java stack for our Data Pipeline. Over the course of two years we built our connector ecosystem in Flink and exposed to developers a very lightweight stream processing API based on Flink SQL.
While Flink SQL was quickly adopted by our developers, we soon realized that more complex use cases cannot easily be solved in Flink SQL. At the same time, the alternative engineering cost to develop and maintain a complex production application in an unfamiliar JVM based language was not convenient for our product teams.
At the beginning of the year, we started looking at Apache Beam with the goal of filling the gap between the high level Flink SQL api and the low level advanced Java Flink streaming API. In this talk I’ll cover the motivations and use cases that brought us to adopt Apache Beam, the challenges we faced integrating Beam with an existing Flink infrastructure and the strategy we followed to deploy and productionize at scale.
Stream processing for the masses with Beam, Python and Flink Enrico Canzonieri
September 13, 2019