Apache Big Data Seville 2016 – Create a Hadoop Cluster and Migrate 39PB Data Plus 150000 Jobs/Day – Stuart Pook

Create a Hadoop Cluster and Migrate 39PB Data Plus 150000 Jobs/Day – Stuart Pook

Criteo had an Hadoop cluster with 39 PB raw stockage, 13404 CPUs, 105 TB RAM, 40 TB data imported per day and >100000 jobs per day. This cluster was critical in both stockage and compute but without backups. This talk describes: 0/ the different options considered when deciding how to protect our data and compute capacity 1/ the criteria established for the 800 new computers and comparison tests between suppliers’ hardware 2/ the non-blocking network infrastructure with 10 Gb/s endpoints scalable to 5000 machines 3/ the installation and configuration, using Chef, of a cluster on new hardware 4/ the problems encountered in moving our jobs and data from the old CDH4 cluster to the new CDH5 cluster 600 km distant 5/ running and feeding with data the two clusters in parallel 6/ fail over plans 7/ operational issues 8/ the performance of the 16800 core, 200 TB RAM and 60 PB disk CDH5 cluster.

More information about this talk

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s