The voice of The Apache Software Foundation

The journey of building a Beam runner based on Spark structured streaming framework Etienne Chauchot

September 12, 2019

Apache Beam provides a unified programming model to execute batch and streaming pipelines on all the popular big data engines. The translation layer from Beam to the chosen big data engine is called a runner. The current runner for Apache Spark is based on the RDD/DStream framework. However, there is an ongoing work to move it to Spark next generation framework a.k.a structured streaming. This talk will present why structured streaming is a good fit for Apache Beam, why it is worth the effort, and will give some feedback on how Apache Beam has solved the challenge, what the tough points and the sweet points were.

Leave a Reply

Required fields are marked *.

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Blog at WordPress.com.
%d bloggers like this: