The voice of The Apache Software Foundation

Apache Big Data Seville 2016: Why is My Hadoop Cluster Slow? – Steve Loughran

December 9, 2016

Why is My Hadoop Cluster Slow? – Steve Loughran

Apache Hadoop is used to run jobs that execute tasks over multiple machines with complex dependencies between tasks. And at scale, there can be 10’s to 1000’s of tasks running over 100’s to 1000äó»s of machines which increases the challenge of making sense of their performance. Pipelines of such jobs that logically run a business workflow add another level of complexity. No wonder that the question of why Hadoop jobs run slower than expected remains a perennial source of grief for developers. In this talk, we will draw on our experience in debugging and analyzing Hadoop jobs to describe some methodical approaches to this and present current and new tracing and tooling ideas that can help semi-automate parts of this difficult problem.

More information about this talk

Leave a Reply

Powered by WordPress.com.
%d bloggers like this: