Apache Big Data Seville 2016 – Distributed In-Database Machine Learning with Apache MADlib (incubating) – Roman Shaposhnik

Distributed In-Database Machine Learning with Apache MADlib (incubating) – Roman Shaposhnik

Data science is moving with gusto to the enterprise, where data often resides in relational databases with SQL as the main workload. So how can an enterprise add a data science dimension to their business without a major IT re-architecture?

Apache MADlib (incubating) is an innovative SQL-based open source library for scalable in-database analytics. It provides parallel implementations of mathematical, statistical and machine learning methods. Bringing machine learning computations to the data makes for excellent scale out performance on massively parallel processing (MPP) platforms like Greenplum database and Apache HAWQ (incubating).

In this talk, we will describe the origin of MADlib, review the architecture and common usage patterns, and look ahead to some interesting plans around performance acceleration.

More information about this talk

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s