The voice of The Apache Software Foundation

Production-ready stream data pipeline in Merpay, Inc syucream

September 13, 2019

We’ve started to provide our stream based data pipeline by using Google Cloud Dataflow and Apache Beam since 2018 fall. It collects event logs from microservices running on GKE, then transforms and forwards the logs to GCS and BigQuery to use for analytics, etc. As you know, implementing and operating streaming jobs are challenging. We’re encountered various issues during that time.
I’d like to share our knowledge on development and operation perspective. There are 3 topics in the development part, 1) Implementing stream jobs with using spotify/scio, 2) How to debug the jobs, especially having DynamicDestination, 3) How to load testing, to ensure our jobs stable. And topics in the next part, 1) How to deploy new jobs safely(with avoiding data loss), 2) How to monitor the jobs and surrounding systems, and misc.

Leave a Reply

Required fields are marked *.

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Blog at WordPress.com.
%d bloggers like this: