Apache Beam provides a unified programming model to execute batch and streaming pipelines on all the popular big data engines. The translation layer from Beam to the chosen big data engine is called a runner. The current runner for Apache Spark is based on the RDD/DStream framework. However, there is an ongoing work to move it to Spark next generation framework a.k.a structured streaming. This talk will present why structured streaming is a good fit for Apache Beam, why it is worth the effort, and will give some feedback on how Apache Beam has solved the challenge, what the tough points and the sweet points were.
The journey of building a Beam runner based on Spark structured streaming framework Etienne Chauchot
September 12, 2019