Artwork

内容由Kostas Pardalis, Nitay Joffe提供。所有播客内容(包括剧集、图形和播客描述)均由 Kostas Pardalis, Nitay Joffe 或其播客平台合作伙伴直接上传和提供。如果您认为有人在未经您许可的情况下使用您的受版权保护的作品,您可以按照此处概述的流程进行操作https://zh.player.fm/legal
Player FM -播客应用
使用Player FM应用程序离线!

From pandas to Arrow: Wes McKinney on the Future of Data Infrastructure

1:22:05
 
分享
 

Manage episode 522227342 series 3594857
内容由Kostas Pardalis, Nitay Joffe提供。所有播客内容(包括剧集、图形和播客描述)均由 Kostas Pardalis, Nitay Joffe 或其播客平台合作伙伴直接上传和提供。如果您认为有人在未经您许可的情况下使用您的受版权保护的作品,您可以按照此处概述的流程进行操作https://zh.player.fm/legal

Summary

In this episode of Tech on the Rocks, Kostas and Nitay sit down with Wes McKinney the creator of pandas and co-creator of Apache Arrow and Ibis, and long-time leader in the Python data ecosystem. Wes walks us through his journey from building pandas in 2008 to rethinking how we represent and move columnar data with Arrow, and why Arrow is fundamentally different from file formats like Parquet and ORC.

We get into the future of data file formats, DataFusion and the new generation of query engines, the rise of open data lakes (Iceberg, Delta, Hudi), and why “big metadata” is becoming just as important as big data. Wes also shares candid thoughts on open source sustainability, how companies and infrastructure projects really survive, and how AI coding agents like Claude Code are changing the day-to-day work of software engineers, especially for complex systems work.

If you care about the foundations of modern data infrastructure, or you’ve ever called import pandas as pd, this is an episode you won’t want to miss.

Chapters

00:00 Intro — Wes McKinney & his journey in the Python data ecosystem

02:15 How pandas evolved & why UX first mattered for data science

06:14 Open source sustainability, funding & the Posit model

07:31 From pandas to Datapad, Cloudera & the origins of Apache Arrow and Ibis

13:38 What is Apache Arrow? In‑memory columnar data, batches & schemas

22:23 Inside Arrow IPC — zero‑copy, Flatbuffers & cross‑language interop

24:34 Arrow vs Parquet — columnar memory format vs columnar storage format

29:28 The next generation of columnar file formats & GPU‑friendly encodings

36:03 Big metadata, table formats & the rise of Iceberg/Delta/Hudi

43:05 Rethinking data systems: from big data to DuckDB, Rust & “no JVM” stacks

54:11 DataFusion as a modular Rust query engine for modern startups

57:58 Open source, the composable data stack & why infra is “AI‑resistant”

01:00:07 Vibe‑coding with AI agents — using Claude Code in real projects

01:09:49 AI, open source maintainers & the risks of AI‑generated contributions

01:18:57 Bridging LLMs and data: ADBC, data context & the future of infra + AI

  continue reading

23集单集

Artwork
icon分享
 
Manage episode 522227342 series 3594857
内容由Kostas Pardalis, Nitay Joffe提供。所有播客内容(包括剧集、图形和播客描述)均由 Kostas Pardalis, Nitay Joffe 或其播客平台合作伙伴直接上传和提供。如果您认为有人在未经您许可的情况下使用您的受版权保护的作品,您可以按照此处概述的流程进行操作https://zh.player.fm/legal

Summary

In this episode of Tech on the Rocks, Kostas and Nitay sit down with Wes McKinney the creator of pandas and co-creator of Apache Arrow and Ibis, and long-time leader in the Python data ecosystem. Wes walks us through his journey from building pandas in 2008 to rethinking how we represent and move columnar data with Arrow, and why Arrow is fundamentally different from file formats like Parquet and ORC.

We get into the future of data file formats, DataFusion and the new generation of query engines, the rise of open data lakes (Iceberg, Delta, Hudi), and why “big metadata” is becoming just as important as big data. Wes also shares candid thoughts on open source sustainability, how companies and infrastructure projects really survive, and how AI coding agents like Claude Code are changing the day-to-day work of software engineers, especially for complex systems work.

If you care about the foundations of modern data infrastructure, or you’ve ever called import pandas as pd, this is an episode you won’t want to miss.

Chapters

00:00 Intro — Wes McKinney & his journey in the Python data ecosystem

02:15 How pandas evolved & why UX first mattered for data science

06:14 Open source sustainability, funding & the Posit model

07:31 From pandas to Datapad, Cloudera & the origins of Apache Arrow and Ibis

13:38 What is Apache Arrow? In‑memory columnar data, batches & schemas

22:23 Inside Arrow IPC — zero‑copy, Flatbuffers & cross‑language interop

24:34 Arrow vs Parquet — columnar memory format vs columnar storage format

29:28 The next generation of columnar file formats & GPU‑friendly encodings

36:03 Big metadata, table formats & the rise of Iceberg/Delta/Hudi

43:05 Rethinking data systems: from big data to DuckDB, Rust & “no JVM” stacks

54:11 DataFusion as a modular Rust query engine for modern startups

57:58 Open source, the composable data stack & why infra is “AI‑resistant”

01:00:07 Vibe‑coding with AI agents — using Claude Code in real projects

01:09:49 AI, open source maintainers & the risks of AI‑generated contributions

01:18:57 Bridging LLMs and data: ADBC, data context & the future of infra + AI

  continue reading

23集单集

所有剧集

×
 
Loading …

欢迎使用Player FM

Player FM正在网上搜索高质量的播客,以便您现在享受。它是最好的播客应用程序,适用于安卓、iPhone和网络。注册以跨设备同步订阅。

 

快速参考指南

版权2025 | 隐私政策 | 服务条款 | | 版权
边探索边听这个节目
播放