Artwork

内容由Arize AI提供。所有播客内容(包括剧集、图形和播客描述)均由 Arize AI 或其播客平台合作伙伴直接上传和提供。如果您认为有人在未经您许可的情况下使用您的受版权保护的作品,您可以按照此处概述的流程进行操作https://zh.player.fm/legal
Player FM -播客应用
使用Player FM应用程序离线!

LibreEval: The Largest Open Source Benchmark for RAG Hallucination Detection

27:19
 
分享
 

Manage episode 477771441 series 3448051
内容由Arize AI提供。所有播客内容(包括剧集、图形和播客描述)均由 Arize AI 或其播客平台合作伙伴直接上传和提供。如果您认为有人在未经您许可的情况下使用您的受版权保护的作品,您可以按照此处概述的流程进行操作https://zh.player.fm/legal

For this week's paper read, we dive into our own research.
We wanted to create a replicable, evolving dataset that can keep pace with model training so that you always know you're testing with data your model has never seen before. We also saw the prohibitively high cost of running LLM evals at scale, and have used our data to fine-tune a series of SLMs that perform just as well as their base LLM counterparts, but at 1/10 the cost.
So, over the past few weeks, the Arize team generated the largest public dataset of hallucinations, as well as a series of fine-tuned evaluation models.
We talk about what we built, the process we took, and the bottom line results. You can read the recap of LibreEval here. Dive into the research, or sign up to join us next time.

Learn more about AI observability and evaluation, join the Arize AI Slack community or get the latest on LinkedIn and X.

  continue reading

59集单集

Artwork
icon分享
 
Manage episode 477771441 series 3448051
内容由Arize AI提供。所有播客内容(包括剧集、图形和播客描述)均由 Arize AI 或其播客平台合作伙伴直接上传和提供。如果您认为有人在未经您许可的情况下使用您的受版权保护的作品,您可以按照此处概述的流程进行操作https://zh.player.fm/legal

For this week's paper read, we dive into our own research.
We wanted to create a replicable, evolving dataset that can keep pace with model training so that you always know you're testing with data your model has never seen before. We also saw the prohibitively high cost of running LLM evals at scale, and have used our data to fine-tune a series of SLMs that perform just as well as their base LLM counterparts, but at 1/10 the cost.
So, over the past few weeks, the Arize team generated the largest public dataset of hallucinations, as well as a series of fine-tuned evaluation models.
We talk about what we built, the process we took, and the bottom line results. You can read the recap of LibreEval here. Dive into the research, or sign up to join us next time.

Learn more about AI observability and evaluation, join the Arize AI Slack community or get the latest on LinkedIn and X.

  continue reading

59集单集

सभी एपिसोड

×
 
Loading …

欢迎使用Player FM

Player FM正在网上搜索高质量的播客,以便您现在享受。它是最好的播客应用程序,适用于安卓、iPhone和网络。注册以跨设备同步订阅。

 

快速参考指南

版权2025 | 隐私政策 | 服务条款 | | 版权
边探索边听这个节目
播放