Data replication at PayPal drives various different business use-cases from fraud detection, user behavioral analysis, credit checks to lot of other offline business decisions. During this talk, we will present how Apache Gobblin empowers data movement and integrations at PayPal in partnership with LinkedIn to showcase all the recent features as well as the planned roadmap for the platform. Apache Gobblin is a distributed data integration framework that simplifies common aspects of big data integration such as data ingestion, replication, organization and lifecycle management for both streaming and batch data ecosystems. In the second half of this presentation, we will present recent additions to Gobblin including: 1. A new declarative approach for defining data pipelines using Gobblin-as-a-Service, and 2. Real world experiences running hybrid batch and streaming pipelines using Gobblin.
Data Movement & Integration at PayPal & LinkedIn using Apache Gobblin Jay Sen Sudarshan Vasudevan
September 13, 2019