使用Player FM应用程序离线!
#37 DoK Community: Running Data Replication Pipelines on Kubernetes with Argo // Stephen Bailey
Manage episode 288298062 series 2865115
Abstract of the talk…
Hundreds of data teams have migrated to the ELT pattern in recent years, leveraging SaaS tools like Stitch or FiveTran to reliably load data into their infrastructure. These SaaS offerings are outstanding and can accelerate your time to production significantly. However, many teams prefer to roll their own tools. One solution in these cases is to deploy singer.io taps and targets — Python scripts that can perform data replication between arbitrary sources and destinations. The Singer specification is the foundation for the popular Stitch SaaS, and it is also leveraged by a number of independent consultants and data projects. Singer pipelines are highly modular. You can pipe any tap to any target to build a data pipeline that fits your needs, making them a good fit for containerized workflows. This article walks through the workflow at a high level and provides some example code to get up and running with some shared templates. I also drill into reasons for choosing the Argo approach over other orchestration tools like Airflow or Dagster, and the implications from a team perspective.
Bio…
Stephen Bailey is Director of Growth Analytics at Immuta, where he strives to implement privacy best practices while delivering business value from data. He loves to teach and learn, on just about any subject. He holds a PhD in educational cognitive neuroscience from Vanderbilt and enjoys reading philosophy
243集单集
Manage episode 288298062 series 2865115
Abstract of the talk…
Hundreds of data teams have migrated to the ELT pattern in recent years, leveraging SaaS tools like Stitch or FiveTran to reliably load data into their infrastructure. These SaaS offerings are outstanding and can accelerate your time to production significantly. However, many teams prefer to roll their own tools. One solution in these cases is to deploy singer.io taps and targets — Python scripts that can perform data replication between arbitrary sources and destinations. The Singer specification is the foundation for the popular Stitch SaaS, and it is also leveraged by a number of independent consultants and data projects. Singer pipelines are highly modular. You can pipe any tap to any target to build a data pipeline that fits your needs, making them a good fit for containerized workflows. This article walks through the workflow at a high level and provides some example code to get up and running with some shared templates. I also drill into reasons for choosing the Argo approach over other orchestration tools like Airflow or Dagster, and the implications from a team perspective.
Bio…
Stephen Bailey is Director of Growth Analytics at Immuta, where he strives to implement privacy best practices while delivering business value from data. He loves to teach and learn, on just about any subject. He holds a PhD in educational cognitive neuroscience from Vanderbilt and enjoys reading philosophy
243集单集
所有剧集
×欢迎使用Player FM
Player FM正在网上搜索高质量的播客,以便您现在享受。它是最好的播客应用程序,适用于安卓、iPhone和网络。注册以跨设备同步订阅。