Data Lake is a system of data stored, processed and managed in its natural/raw format, user could store data as-is, without having to first structure the data, and run different types of analytics. With the increasing adoption of big data – from dashboards and visualizations to big data processing, real-time analytics, and machine learning to guide better decisions, data warehouse itself cannot satisfy the changing of workload patterns, Data Lake, as the generalization of data warehouse, becomes a buzzword in recent years.
Public cloud providers announced their own Data Lake services one after another, trying to mitigate the gaps in traditional data warehouse solution. Tencent, one of top 3 ISP in China, also has a such requirement. Unlike other companies who built it with their in-house solutions. Tencent fully embraces the open source community and uses open source softwares to build a production ready Data Lake solution.
In this talk, we will introduce you what is Data Lake, and how to build a Data Lake solution. Especially, we will introduce you the Tencent solution which uses open source technologies to build a large scale, production ready Data Lake solution. Last but not least, we will also show you own contributions back to the community. Through this talk, audience will get a full understanding of Data Lake and Tencent way to build this solution.