Artwork

内容由LessWrong提供。所有播客内容(包括剧集、图形和播客描述)均由 LessWrong 或其播客平台合作伙伴直接上传和提供。如果您认为有人在未经您许可的情况下使用您的受版权保护的作品,您可以按照此处概述的流程进行操作https://zh.player.fm/legal
Player FM -播客应用
使用Player FM应用程序离线!

“METR: Measuring AI Ability to Complete Long Tasks” by Zach Stein-Perlman

11:09
 
分享
 

Manage episode 475697702 series 3364758
内容由LessWrong提供。所有播客内容(包括剧集、图形和播客描述)均由 LessWrong 或其播客平台合作伙伴直接上传和提供。如果您认为有人在未经您许可的情况下使用您的受版权保护的作品,您可以按照此处概述的流程进行操作https://zh.player.fm/legal
Summary: We propose measuring AI performance in terms of the length of tasks AI agents can complete. We show that this metric has been consistently exponentially increasing over the past 6 years, with a doubling time of around 7 months. Extrapolating this trend predicts that, in under five years, we will see AI agents that can independently complete a large fraction of software tasks that currently take humans days or weeks.
The length of tasks (measured by how long they take human professionals) that generalist frontier model agents can complete autonomously with 50% reliability has been doubling approximately every 7 months for the last 6 years. The shaded region represents 95% CI calculated by hierarchical bootstrap over task families, tasks, and task attempts.
Full paper | Github repo
We think that forecasting the capabilities of future AI systems is important for understanding and preparing for the impact of [...]
---
Outline:
(08:58) Conclusion
(09:59) Want to contribute?
---
First published:
March 19th, 2025
Source:
https://www.lesswrong.com/posts/deesrjitvXM4xYGZd/metr-measuring-ai-ability-to-complete-long-tasks
---
Narrated by TYPE III AUDIO.
---
Images from the article:
Graph showing AI task complexity doubling every 7 months through 2026.
Graph showing AI task completion lengths doubling every 7 months.
Graph showing AI model task lengths doubling every 7 months from 2020-2024.
Graph showing
  continue reading

708集单集

Artwork
icon分享
 
Manage episode 475697702 series 3364758
内容由LessWrong提供。所有播客内容(包括剧集、图形和播客描述)均由 LessWrong 或其播客平台合作伙伴直接上传和提供。如果您认为有人在未经您许可的情况下使用您的受版权保护的作品,您可以按照此处概述的流程进行操作https://zh.player.fm/legal
Summary: We propose measuring AI performance in terms of the length of tasks AI agents can complete. We show that this metric has been consistently exponentially increasing over the past 6 years, with a doubling time of around 7 months. Extrapolating this trend predicts that, in under five years, we will see AI agents that can independently complete a large fraction of software tasks that currently take humans days or weeks.
The length of tasks (measured by how long they take human professionals) that generalist frontier model agents can complete autonomously with 50% reliability has been doubling approximately every 7 months for the last 6 years. The shaded region represents 95% CI calculated by hierarchical bootstrap over task families, tasks, and task attempts.
Full paper | Github repo
We think that forecasting the capabilities of future AI systems is important for understanding and preparing for the impact of [...]
---
Outline:
(08:58) Conclusion
(09:59) Want to contribute?
---
First published:
March 19th, 2025
Source:
https://www.lesswrong.com/posts/deesrjitvXM4xYGZd/metr-measuring-ai-ability-to-complete-long-tasks
---
Narrated by TYPE III AUDIO.
---
Images from the article:
Graph showing AI task complexity doubling every 7 months through 2026.
Graph showing AI task completion lengths doubling every 7 months.
Graph showing AI model task lengths doubling every 7 months from 2020-2024.
Graph showing
  continue reading

708集单集

所有剧集

×
 
Loading …

欢迎使用Player FM

Player FM正在网上搜索高质量的播客,以便您现在享受。它是最好的播客应用程序,适用于安卓、iPhone和网络。注册以跨设备同步订阅。

 

快速参考指南

版权2025 | 隐私政策 | 服务条款 | | 版权
边探索边听这个节目
播放