Artwork

内容由The Data Flowcast提供。所有播客内容(包括剧集、图形和播客描述)均由 The Data Flowcast 或其播客平台合作伙伴直接上传和提供。如果您认为有人在未经您许可的情况下使用您的受版权保护的作品,您可以按照此处概述的流程进行操作https://zh.player.fm/legal
Player FM -播客应用
使用Player FM应用程序离线!

Scaling On-Prem Airflow With 2,000 DAGs at Numberly with Sébastien Crocquevieille

24:17
 
分享
 

Manage episode 501480374 series 2053958
内容由The Data Flowcast提供。所有播客内容(包括剧集、图形和播客描述)均由 The Data Flowcast 或其播客平台合作伙伴直接上传和提供。如果您认为有人在未经您许可的情况下使用您的受版权保护的作品,您可以按照此处概述的流程进行操作https://zh.player.fm/legal

Scaling 2,000+ data pipelines isn’t easy. But with the right tools and a self-hosted mindset, it becomes achievable.

In this episode, Sébastien Crocquevieille, Data Engineer at Numberly, unpacks how the team scaled their on-prem Airflow setup using open-source tooling and Kubernetes. We explore orchestration strategies, UI-driven stakeholder access and Airflow’s evolving features.

Key Takeaways:

00:00 Introduction.

02:13 Overview of the company’s operations and global presence.

04:00 The tech stack and structure of the data engineering team.

04:24 Running nearly 2,000 DAGs in production using Airflow.

05:42 How Airflow’s UI empowers stakeholders to self-serve and troubleshoot.

07:05 Details on the Kubernetes-based Airflow setup using Helm charts.

09:31 Transition from GitSync to NFS for DAG syncing due to performance issues.

14:11 Making every team member Airflow-literate through local installation.

17:56 Using custom libraries and plugins to extend Airflow functionality.

Resources Mentioned:

Sébastien Crocquevieille

https://www.linkedin.com/in/scroc/

Numberly | LinkedIn

https://www.linkedin.com/company/numberly/

Numberly | Website

https://numberly.com/

Apache Airflow

https://airflow.apache.org/

Grafana

https://grafana.com/

Apache Kafka

https://kafka.apache.org/

Helm Chart for Apache Airflow

https://airflow.apache.org/docs/helm-chart/stable/index.html

Kubernetes

https://kubernetes.io/

GitLab

https://about.gitlab.com/

KubernetesPodOperator – Airflow

https://airflow.apache.org/docs/apache-airflow-providers-cncf-kubernetes/stable/operators.html

Beyond Analytics Conference

https://astronomer.io/beyond/dataflowcast

Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.

#AI #Automation #Airflow #MachineLearning

  continue reading

81集单集

Artwork
icon分享
 
Manage episode 501480374 series 2053958
内容由The Data Flowcast提供。所有播客内容(包括剧集、图形和播客描述)均由 The Data Flowcast 或其播客平台合作伙伴直接上传和提供。如果您认为有人在未经您许可的情况下使用您的受版权保护的作品,您可以按照此处概述的流程进行操作https://zh.player.fm/legal

Scaling 2,000+ data pipelines isn’t easy. But with the right tools and a self-hosted mindset, it becomes achievable.

In this episode, Sébastien Crocquevieille, Data Engineer at Numberly, unpacks how the team scaled their on-prem Airflow setup using open-source tooling and Kubernetes. We explore orchestration strategies, UI-driven stakeholder access and Airflow’s evolving features.

Key Takeaways:

00:00 Introduction.

02:13 Overview of the company’s operations and global presence.

04:00 The tech stack and structure of the data engineering team.

04:24 Running nearly 2,000 DAGs in production using Airflow.

05:42 How Airflow’s UI empowers stakeholders to self-serve and troubleshoot.

07:05 Details on the Kubernetes-based Airflow setup using Helm charts.

09:31 Transition from GitSync to NFS for DAG syncing due to performance issues.

14:11 Making every team member Airflow-literate through local installation.

17:56 Using custom libraries and plugins to extend Airflow functionality.

Resources Mentioned:

Sébastien Crocquevieille

https://www.linkedin.com/in/scroc/

Numberly | LinkedIn

https://www.linkedin.com/company/numberly/

Numberly | Website

https://numberly.com/

Apache Airflow

https://airflow.apache.org/

Grafana

https://grafana.com/

Apache Kafka

https://kafka.apache.org/

Helm Chart for Apache Airflow

https://airflow.apache.org/docs/helm-chart/stable/index.html

Kubernetes

https://kubernetes.io/

GitLab

https://about.gitlab.com/

KubernetesPodOperator – Airflow

https://airflow.apache.org/docs/apache-airflow-providers-cncf-kubernetes/stable/operators.html

Beyond Analytics Conference

https://astronomer.io/beyond/dataflowcast

Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.

#AI #Automation #Airflow #MachineLearning

  continue reading

81集单集

所有剧集

×
 
Loading …

欢迎使用Player FM

Player FM正在网上搜索高质量的播客,以便您现在享受。它是最好的播客应用程序,适用于安卓、iPhone和网络。注册以跨设备同步订阅。

 

快速参考指南

版权2025 | 隐私政策 | 服务条款 | | 版权
边探索边听这个节目
播放