Create a Hadoop Cluster and Migrate 39PB Data Plus 150000 Jobs/Day – Stuart Pook
Criteo had an Hadoop cluster with 39 PB raw stockage, 13404 CPUs, 105 TB RAM, 40 TB data imported per day and >100000 jobs per day. This cluster was critical in both stockage and compute but without backups. This talk describes: 0/ the different options considered when deciding how to protect our data and compute capacity 1/ the criteria established for the 800 new computers and comparison tests between suppliers’ hardware 2/ the non-blocking network infrastructure with 10 Gb/s endpoints scalable to 5000 machines 3/ the installation and configuration, using Chef, of a cluster on new hardware 4/ the problems encountered in moving our jobs and data from the old CDH4 cluster to the new CDH5 cluster 600 km distant 5/ running and feeding with data the two clusters in parallel 6/ fail over plans 7/ operational issues 8/ the performance of the 16800 core, 200 TB RAM and 60 PB disk CDH5 cluster.