The voice of The Apache Software Foundation

Apache Big Data Seville 2016 – Power Pig with Spark – Liyun Zhang

January 5, 2017

Power Pig with Spark – Liyun Zhang

Apache Pig is a popular scripting platform for processing and analyzing large data sets in the Hadoop ecosystem. With its open architecture and backend neutrality, Pig scripts can currently run on MapReduce and Tez. Apache Spark is an open-source data analytics cluster computing framework that has gained significant momentum recently. Besides offering performance advantages, Spark is also a more natural fit for the query plan produced by Pig. Pig on Spark enables improved ETL performance while also supporting users intending to standardize to Spark as the execution engine.

More information about this talk

Powered by WordPress.com.