One of the challenges at Meetup is how to build a scalable, reliable and efficient data platform to help our ML team builds models that recommend events fit your interests. With emerging sophisticated batch and streaming frameworks and cloud solutions, our data platform went through massive changes in the past two years. In this talk, I’ll discuss the evolution of how Meetup data platform utilizes Apache-based data systems, including Sqoop, Hive, Flume, Spark, Flink, Beam, Airflow. I’ll talk about architecture changes to our batch and stream pipeline solutions and what pros/cons to move data platform 100% to cloud. I’ll also share some lessons we learned and best practices on building distributed systems for data platform, and how the data platform collaborates with machines learning and data science team.