Breaking Down EvalGen: Who Validates The Validators? Deep Papers podcast

Artwork

Science Tech Math Business Arize AI

内容由Arize AI提供。所有播客内容（包括剧集、图形和播客描述）均由 Arize AI 或其播客平台合作伙伴直接上传和提供。如果您认为有人在未经您许可的情况下使用您的受版权保护的作品，您可以按照此处概述的流程进行操作https://zh.player.fm/legal。

Deep Papers « »
Breaking Down EvalGen: Who Validates the Validators?

1M ago 44:47

分享

MP3•单集首页

内容由Arize AI提供。所有播客内容（包括剧集、图形和播客描述）均由 Arize AI 或其播客平台合作伙伴直接上传和提供。如果您认为有人在未经您许可的情况下使用您的受版权保护的作品，您可以按照此处概述的流程进行操作https://zh.player.fm/legal。

Due to the cumbersome nature of human evaluation and limitations of code-based evaluation, Large Language Models (LLMs) are increasingly being used to assist humans in evaluating LLM outputs. Yet LLM-generated evaluators often inherit the problems of the LLMs they evaluate, requiring further human validation.

This week’s paper explores EvalGen, a mixed-initative approach to aligning LLM-generated evaluation functions with human preferences. EvalGen assists users in developing both criteria acceptable LLM outputs and developing functions to check these standards, ensuring evaluations reflect the users’ own grading standards.
Read it on the blog: https://arize.com/blog/breaking-down-evalgen-who-validates-the-validators/

Paper: https://arxiv.org/abs/2404.12272

To learn more about ML observability, join the Arize AI Slack community or get the latest on our LinkedIn and Twitter.

… continue reading

25集单集

#Science #Tech #Math #Business #Arize AI

Artwork

Breaking Down EvalGen: Who Validates the Validators?

13 subscribers

published 1M ago

分享

MP3•单集首页

内容由Arize AI提供。所有播客内容（包括剧集、图形和播客描述）均由 Arize AI 或其播客平台合作伙伴直接上传和提供。如果您认为有人在未经您许可的情况下使用您的受版权保护的作品，您可以按照此处概述的流程进行操作https://zh.player.fm/legal。

Due to the cumbersome nature of human evaluation and limitations of code-based evaluation, Large Language Models (LLMs) are increasingly being used to assist humans in evaluating LLM outputs. Yet LLM-generated evaluators often inherit the problems of the LLMs they evaluate, requiring further human validation.

This week’s paper explores EvalGen, a mixed-initative approach to aligning LLM-generated evaluation functions with human preferences. EvalGen assists users in developing both criteria acceptable LLM outputs and developing functions to check these standards, ensuring evaluations reflect the users’ own grading standards.
Read it on the blog: https://arize.com/blog/breaking-down-evalgen-who-validates-the-validators/

Paper: https://arxiv.org/abs/2404.12272

To learn more about ML observability, join the Arize AI Slack community or get the latest on our LinkedIn and Twitter.

… continue reading

25集单集

#Science #Tech #Math #Business #Arize AI

所有剧集

×

欢迎使用Player FM

Player FM正在网上搜索高质量的播客，以便您现在享受。它是最好的播客应用程序，适用于安卓、iPhone和网络。注册以跨设备同步订阅。

收听超过500个主题