Apache Big Data Seville 2016 – Low Latency Web Crawling on Apache Storm – Julien Nioche

Low Latency Web Crawling on Apache Storm – Julien Nioche

StormCrawler is an open source collection of resources, mostly implemented in Java, for building low-latency, scalable web crawlers on Apache Storm. After a short introduction to Apache Storm and an overview of what StormCrawler provides, we will compare it with similar projects like Apache Nutch and present several real life use cases. In particular we will see how StormCrawler can be used with ElasticSearch and Kibana for crawling and indexing web pages and also monitor the crawl itself.

More information about this talk

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s