Native and Distributed Machine Learning with Apache Mahout – Suneel Marthi
Data scientists love tools like R and Scikit-Learn since they are declarative and offer convenient and intuitive syntax for analysis tasks but are limited by local memory, Mahout offers similar features with near seamless distributed execution.
In this talk, we will look at Mahout-Samsara’s distributed linear algebra capabilities and demonstrate the same by building a classification algorithm for the popular ‘Eigenfaces’ problem using the Samsara DSL from an Apache Zeppelin notebook. We will demonstrate how a simple classification algorithm may be prototyped and executed, and show the performance using Samsara DSL with GPU acceleration. This will demonstrate how ML algorithms built with Samsara DSL are automatically parallelized and optimized to execute on Apache Flink and Apache Spark without the developer having to deal with the underlying semantics of the execution engine.