FeatherCast

The voice of The Apache Software Foundation

Simple, Portable data pipelines with Apache Beam SQL Andrew Pilloud

September 12, 2019
timothyarthur

Apache Beam is a unified data processing framework, allowing you to write batch and streaming pipelines that run anywhere, including Apache Flink, Apache Spark, and Google Cloud Dataflow. With the SQL extension you can now write a pipeline in pure SQL. If you need more, you can write user defined functions in Java or even embed SQL into your existing Java pipeline. This talk will start with a demo pipeline written in pure SQL. We will review how streaming SQL came from collaboration between the Apache Beam, Apache Calcite and Apache Flink communities. Finally, we will deep-dive into the architecture of Beam’s implementation and the work we are doing to make Apache Beam SQL the default choice for writing new streaming pipelines.

Leave a Reply

Powered by WordPress.com.
%d