Apache Big Data Seville 2016: Why is My Hadoop Cluster Slow? – Steve Loughran

Why is My Hadoop Cluster Slow? – Steve Loughran

Apache Hadoop is used to run jobs that execute tasks over multiple machines with complex dependencies between tasks. And at scale, there can be 10’s to 1000’s of tasks running over 100’s to 1000äó»s of machines which increases the challenge of making sense of their performance. Pipelines of such jobs that logically run a business workflow add another level of complexity. No wonder that the question of why Hadoop jobs run slower than expected remains a perennial source of grief for developers. In this talk, we will draw on our experience in debugging and analyzing Hadoop jobs to describe some methodical approaches to this and present current and new tracing and tooling ideas that can help semi-automate parts of this difficult problem.

More information about this talk

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s