The voice of The Apache Software Foundation

Apache Beam for production machine learning: Tensorflow Extended (TFX) / Beaming Deep Learning with Ludwig Suneel Marthi

September 13, 2019

Part I
Developing ML and deep learning applications to be deployed in production is much more than just training a model. Google has taken years of experience in developing production ML pipelines and offered the open source community TensorFlow Extended (TFX), an open source version of the ML platform that Google uses internally. Pipeline processing is a core requirement of any production ML platform, and the TFX has chosen Apache Beam to implement their pipeline.
Learn from Google’s experience in applying Beam for ML pipelines, including how TFX uses Beam and why Beam was chosen.

Part II
Ludwig is a code-free Deep Learning toolbox based on TensorFlow open-sourced by Uber AI Labs. Ludwig is unique in its ability to help make deep learning easier to understand for non-experts and enable faster model improvement iteration cycles for experienced machine learning developers and researchers alike. By using Ludwig, experts and researchers can simplify the prototyping process and streamline data processing so that they can focus on developing deep learning architectures rather than data wrangling.
Ludwig introduces the notion of data type-specific encoders and decoders, which results in a highly modularized and extensible architecture: each type of data supported (text, images, categories, and so on) has a specific preprocessing function.
In this talk, we’ll be looking at building Beam pipelines to programmatically create Deep Learning models with Ludwig for different input data types for both model training and inference using Beam-Python SDK. We will be showing 2 examples of training deep learning classifiers with text and images on an unbounded source and running inference on that.

Leave a Reply

Powered by WordPress.com.
%d bloggers like this: