FeatherCast

The voice of The Apache Software Foundation

Apache Big Data Seville 2016 – Low Latency Web Crawling on Apache Storm – Julien Nioche

January 2, 2017
asfinfra

Low Latency Web Crawling on Apache Storm – Julien Nioche

StormCrawler is an open source collection of resources, mostly implemented in Java, for building low-latency, scalable web crawlers on Apache Storm. After a short introduction to Apache Storm and an overview of what StormCrawler provides, we will compare it with similar projects like Apache Nutch and present several real life use cases. In particular we will see how StormCrawler can be used with ElasticSearch and Kibana for crawling and indexing web pages and also monitor the crawl itself.

More information about this talk

Leave a Reply

Powered by WordPress.com.
%d bloggers like this: