使用Player FM应用程序离线!
#83 Who’s Minding the Metadata? Why Data Quality Matters in GenAI (Quality Time With Paolo)
Manage episode 476397433 series 3332503
Welcome to the cozy corner of the tech world where ones and zeros mingle with casual chit-chat. Datatopics Unplugged is your go-to spot for relaxed discussions around tech, news, data, and society.
Dive into conversations that should flow as smoothly as your morning coffee (but don't), where industry insights meet laid-back banter. Whether you're a data aficionado or just someone curious about the digital age, pull up a chair, relax, and let's get into the heart of data, unplugged style!
In this episode, host Murilo is joined by returning guest Paolo, Data Management Team Lead at dataroots, for a deep dive into the often-overlooked but rapidly evolving domain of unstructured data quality. Tune in for a field guide to navigating documents, images, and embeddings without losing your sanity.
What we unpack:
- Data management basics: Metadata, ownership, and why Excel isn’t everything.
- Structured vs unstructured data: How the wild west of PDFs, images, and audio is redefining quality.
- Data quality challenges for LLMs: From apples and pears to rogue chatbots with “legally binding” hallucinations.
- Practical checks for document hygiene: Versioning, ownership, embedding similarity, and tagging strategies.
- Retrieval-Augmented Generation (RAG): When ChatGPT meets your HR policies and things get weird.
- Monitoring and governance: Building systems that flag rot before your chatbot gives out 2017 vacation rules.
- Tooling and gaps: Where open source is doing well—and where we’re still duct-taping workflows.
- Real-world inspirations: A look at how QuantumBlack (McKinsey) is tackling similar issues with their AI for DQ framework.
章节
1. #83 Who’s Minding the Metadata? Why Data Quality Matters in GenAI (Quality Time With Paolo) (00:00:00)
2. Welcome to Data Topics (00:00:46)
3. Introducing Data Management (00:01:30)
4. Unstructured vs Structured Data (00:06:30)
5. RAG and Chatbot Applications (00:09:38)
6. Data Quality Issues in Documents (00:17:15)
7. Metadata Checks and Content Analysis (00:25:05)
8. Testing Outputs and Monitoring (00:34:18)
9. Governance and Available Tools (00:42:52)
10. Summary and Additional Resources (00:48:09)
83集单集
Manage episode 476397433 series 3332503
Welcome to the cozy corner of the tech world where ones and zeros mingle with casual chit-chat. Datatopics Unplugged is your go-to spot for relaxed discussions around tech, news, data, and society.
Dive into conversations that should flow as smoothly as your morning coffee (but don't), where industry insights meet laid-back banter. Whether you're a data aficionado or just someone curious about the digital age, pull up a chair, relax, and let's get into the heart of data, unplugged style!
In this episode, host Murilo is joined by returning guest Paolo, Data Management Team Lead at dataroots, for a deep dive into the often-overlooked but rapidly evolving domain of unstructured data quality. Tune in for a field guide to navigating documents, images, and embeddings without losing your sanity.
What we unpack:
- Data management basics: Metadata, ownership, and why Excel isn’t everything.
- Structured vs unstructured data: How the wild west of PDFs, images, and audio is redefining quality.
- Data quality challenges for LLMs: From apples and pears to rogue chatbots with “legally binding” hallucinations.
- Practical checks for document hygiene: Versioning, ownership, embedding similarity, and tagging strategies.
- Retrieval-Augmented Generation (RAG): When ChatGPT meets your HR policies and things get weird.
- Monitoring and governance: Building systems that flag rot before your chatbot gives out 2017 vacation rules.
- Tooling and gaps: Where open source is doing well—and where we’re still duct-taping workflows.
- Real-world inspirations: A look at how QuantumBlack (McKinsey) is tackling similar issues with their AI for DQ framework.
章节
1. #83 Who’s Minding the Metadata? Why Data Quality Matters in GenAI (Quality Time With Paolo) (00:00:00)
2. Welcome to Data Topics (00:00:46)
3. Introducing Data Management (00:01:30)
4. Unstructured vs Structured Data (00:06:30)
5. RAG and Chatbot Applications (00:09:38)
6. Data Quality Issues in Documents (00:17:15)
7. Metadata Checks and Content Analysis (00:25:05)
8. Testing Outputs and Monitoring (00:34:18)
9. Governance and Available Tools (00:42:52)
10. Summary and Additional Resources (00:48:09)
83集单集
所有剧集
×欢迎使用Player FM
Player FM正在网上搜索高质量的播客,以便您现在享受。它是最好的播客应用程序,适用于安卓、iPhone和网络。注册以跨设备同步订阅。