The Original Vision of Nutch, 14 Years Later: Building an Open Source Search Engine – Sylvain Zimmer
Few people remember that before spinning off Hadoop and focusing on crawling, Nutch was meant to be an alternative to commercial search engines. What if we tried to do it again today?
In this presentation, Sylvain Zimmer will explain how he used projects from the Nutch diaspora like Spark and Elasticsearch to build Common Search, an open source search engine with transparent rankings.
We will go over the architecture of large-scale search engines and how it has evolved since the late 90s. Then we will review the tools from the Apache and open source ecosystems that are best suited to solve the many challenges at hand. Finally, we will discuss what lies ahead for Common Search before it can be useful to the general public.