FeatherCast

The voice of The Apache Software Foundation

Apache Big Data Seville 2016 – Hands On! Deploying Apache Hadoop Spark Cluster with HA, Monitoring, and Logging in AWS – Andrew Mcleod & Peter Vander Giessen

January 23, 2017
rbowen

Hands On! Deploying Apache Hadoop Spark Cluster with HA, Monitoring, and Logging in AWS – Andrew Mcleod & Peter Vander Giessen

This is a hands-on workshop style session where attendees will learn how to deploy complex workloads such as a 10 node Hadoop Spark cluster complete with HA, Logging, and Monitoring. We can then scale the cluster from there pending needs. Attendees will also learn how to deploy other workloads such as connecting Apache Kafka into the Solution, connecting Apache Zeppelin into the solution, or trying the latest Cloud Native Kubernetes. We will then run a sample TeraSort, Spark Job, and Pagerank benchmak to get familiar with the cluster. An AWS controller will be provided for folks who don’t have cloud access.
No prior knowledge is needed, but if you want to get a head start install the Juju client by following the docs @ http://jujucharms.com/get-started

More information about this talk

Apache Big Data Seville 2016 – Attacking a Big Data Developer – Olaf Flebbe

January 23, 2017
rbowen

Attacking a Big Data Developer – Olaf Flebbe

Developers are a possible attack vector for targeted attacks to infiltrate malicious code into enterprises.

The Speaker did a network traffic analysis with the Bro Network Security Monitor (bro.org) backed by an ELK Stack while compiling Apache Bigtop, a Big Data Distribution containing Apache Hadoop, Spark, HBase, Hive, Flink et al.

While there are no obvious traces of a malicious code within the traffic, there are many findings of possible attack vectors like unsecurely configured critical software infrastructure servers, usage of private repositories or unsecure protocols.

The Analysis showed that many compile jobs are downloading and running executables from untrusted sources. The author will shortly explain how these weaknesses can be exploited and will give recommendations on how to resolve these issues.

More information about this talk

Apache Big Data Seville 2016 – The Myth of the Big Data Silver Bullet – Why Requirements Still Matter – Nick Burch

January 19, 2017
rbowen

The Myth of the Big Data Silver Bullet – Why Requirements Still Matter – Nick Burch

We’ve all heard the hype – Big Data will solve all your storage, processing and analytic problems effortlessly! As Big Data moves along the adoption cycle, there’s a wider range of possible technologies and platforms you could use, but sadly picking the right one still remains crucial to success. Some moving beyond the buzzwords to deploy Big Data find things really do work well, but others rapidly run into issues. The difference usually isn’t the technologies or the vendors per-se, but their appropriateness to the requirements, which aren’t always clear up-front…

This session won’t tell you what Big Data solution you need. Instead, we’ll cover some of the pitfalls, and help you with the questions towards working out your requirements in time for your Big Data system to be a success!

More information about this talk

Apache Big Data Seville 2016 – User Defined Functions and Materialized Views in Cassandra 3.0 – DuyHai Doan,

January 19, 2017
rbowen

User Defined Functions and Materialized Views in Cassandra 3.0 – DuyHai Doan,

Cassandra is evolving at a very fast pace and keeps introducing new features that close the gap with traditional SQL world, but they are always designed with a distributed approach in mind.

First we’ll throw an eye at the recent user-defined functions and show how they can improve your application performance and enrich your analytics use-cases.

Next, a tour on the materialized views, a major improvement that drastically changes the way people model data in Cassandra and makes developers’ life easier!

More information about this talk

Apache Big Data Seville 2016 – SASI, Cassandra on the Full Text Search Ride! – DuyHai Doan

January 19, 2017
rbowen

SASI, Cassandra on the Full Text Search Ride! – DuyHai Doan

Apache Cassandra is a scalable database with high availability features. But they come with severe limitations in term of querying capabilities.

Since the introduction of SASI in Cassandra 3.4, the limitations belong to the pass. Now you can create indices on your columns as well as benefit from full text search capabilities with the introduction of the new `LIKE ‘%term%’` syntax.

To illustrate how SASI works, we’ll use a database of 100 000 albums and artists. We’ll also show how SASI can help to accelerate analytics scenarios with Apache Spark using SparkSQL predicate push-down.

We also highlight some use-cases where SASI is not a good fit and should be avoided (there is no magic, sorry)

Apache Big Data Seville 2016 – Using Apache Calcite for Enabling SQL and JDBC Access to Apache Geode and Other NoSQL Data Systems – Christian Tzolov

January 19, 2017
rbowen

Using Apache Calcite for Enabling SQL and JDBC Access to Apache Geode and Other NoSQL Data Systems – Christian Tzolov

When working with BigData & IoT systems we often feel the need for a Common Query Language. The system specific languages usually require longer adoption time and are harder to integrate within the existing stacks.

To fill this gap some NoSql vendors are building SQL access to their systems. Building SQL engine from scratch is a daunting job and frameworks like Apache Calcite can help you with the heavy lifting. Calcite allow you to integrate SQL parser, cost-based optimizer, and JDBC with your NoSql system.We will walk through the process of building a SQL access layer for Apache Geode (In-Memory Data Grid). I will share my experience, pitfalls and technical consideration like balancing between the SQL/RDBMS semantics and the design choices and limitations of the data system.

Hopefully this will enable you to add SQL capabilities to your prefered NoSQL data system.

More information about this talk

Apache Big Data Seville 2016 – Introducing Apache CouchDB 2.0 – Jan Lehnardt

January 19, 2017
rbowen

Introducing Apache CouchDB 2.0 – Jan Lehnardt

A thorough introduction to CouchDB 2.0, the five-years-in-the-making final delivery of the larger CouchDB vision.
Apache CouchDB 2,0 finally puts the C back in C.O.U.C.D.B: Cluster of unreliable commodity hardware. With a production-proofed implementation of the Amazon Dynamo paper, CouchDB has now high-availability, multi-machine clustering as well scaling options built-in, making it ready for Big Data solutions that benefit from CouchDB’s unique multi-master replication.

Apache Big Data Seville 2016 – Multi-Tenant Machine Learning with Apache Aurora and Apache Mesos – Stephan Erb

January 13, 2017
rbowen

Multi-Tenant Machine Learning with Apache Aurora and Apache Mesos – Stephan Erb

Data scientists care about statistics and fast iteration cycles for their experiments. They should not be concerned with technicalities like hardware failures, tenant isolation, or low cluster utilization. In order to shield its data scientists from these matters, Blue Yonder is using Apache Aurora.

When adopting Aurora, our goal was to run multiple machine learning projects on the same physical cluster. This talk will go into details of this adoption process and highlight key engineering decisions we have made. Particular focus will reside on the multi-tenancy and oversubscription features of Apache Aurora and Apache Mesos, its underlying resource manager.

Audience members will learn about the fundamentals of both Apache projects and how those can be assembled into a capable machine learning platform.

More information about this talk

Women in Big Data Luncheon & Program – ApacheCon Seville

January 13, 2017
rbowen

The Women’s Luncheon from ApacheCon Seville:

Luncheon Agenda

1:50pm – WiBD Overview – Anna Marchon

2:00pm – Keynote: Tina Rosario, Global VP, Enterprise Data Management at SAP

2:30pm Keynote: Marina Alekseeva, GM of the Intel Software and Service Group in Russia

3:00pm – Networking

More information about this talk

Apache Big Data Seville 2016 – Native and Distributed Machine Learning with Apache Mahout – Suneel Marthi

January 13, 2017
rbowen

Native and Distributed Machine Learning with Apache Mahout – Suneel Marthi

Data scientists love tools like R and Scikit-Learn since they are declarative and offer convenient and intuitive syntax for analysis tasks but are limited by local memory, Mahout offers similar features with near seamless distributed execution.

In this talk, we will look at Mahout-Samsara’s distributed linear algebra capabilities and demonstrate the same by building a classification algorithm for the popular ‘Eigenfaces’ problem using the Samsara DSL from an Apache Zeppelin notebook. We will demonstrate how a simple classification algorithm may be prototyped and executed, and show the performance using Samsara DSL with GPU acceleration. This will demonstrate how ML algorithms built with Samsara DSL are automatically parallelized and optimized to execute on Apache Flink and Apache Spark without the developer having to deal with the underlying semantics of the execution engine.

More information about this talk

Powered by WordPress.com.