FeatherCast

The voice of The Apache Software Foundation

Simple, Portable data pipelines with Apache Beam SQL Andrew Pilloud

September 12, 2019
timothyarthur

Apache Beam is a unified data processing framework, allowing you to write batch and streaming pipelines that run anywhere, including Apache Flink, Apache Spark, and Google Cloud Dataflow. With the SQL extension you can now write a pipeline in pure SQL. If you need more, you can write user defined functions in Java or even embed SQL into your existing Java pipeline. This talk will start with a demo pipeline written in pure SQL. We will review how streaming SQL came from collaboration between the Apache Beam, Apache Calcite and Apache Flink communities. Finally, we will deep-dive into the architecture of Beam’s implementation and the work we are doing to make Apache Beam SQL the default choice for writing new streaming pipelines.

Leave a Reply

Required fields are marked *.

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Blog at WordPress.com.
%d bloggers like this: