AF - The Checklist: What Succeeding at AI Safety Will Involve by Sam Bowman

The Nonlinear Library

内容由The Nonlinear Fund提供。所有播客内容（包括剧集、图形和播客描述）均由 The Nonlinear Fund 或其播客平台合作伙伴直接上传和提供。如果您认为有人在未经您许可的情况下使用您的受版权保护的作品，您可以按照此处概述的流程进行操作https://zh.player.fm/legal。

12d ago 35:25

MP3•单集首页

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: The Checklist: What Succeeding at AI Safety Will Involve, published by Sam Bowman on September 3, 2024 on The AI Alignment Forum.
Crossposted by habryka with Sam's permission. Expect lower probability for Sam to respond to comments here than if he had posted it.
Preface
This piece reflects my current best guess at the major goals that Anthropic (or another similarly positioned AI developer) will need to accomplish to have things go well with the development of broadly superhuman AI. Given my role and background, it's disproportionately focused on technical research and on averting emerging catastrophic risks.
For context, I lead a technical AI safety research group at Anthropic, and that group has a pretty broad and long-term mandate, so I spend a lot of time thinking about what kind of safety work we'll need over the coming years.
This piece is my own opinionated take on that question, though it draws very heavily on discussions with colleagues across the organization: Medium- and long-term AI safety strategy is the subject of countless leadership discussions and Google docs and lunch-table discussions within the organization, and this piece is a snapshot (shared with permission) of where those conversations sometimes go.
To be abundantly clear: Nothing here is a firm commitment on behalf of Anthropic, and most people at Anthropic would disagree with at least a few major points here, but this can hopefully still shed some light on the kind of thinking that motivates our work.
Here are some of the assumptions that the piece relies on. I don't think any one of these is a certainty, but all of them are plausible enough to be worth taking seriously when making plans:
Broadly human-level AI is possible. I'll often refer to this as transformative AI (or TAI), roughly defined as AI that could form as a drop-in replacement for humans in all remote-work-friendly jobs, including AI R&D.[1]
Broadly human-level AI (or TAI) isn't an upper bound on most AI capabilities that matter, and substantially superhuman systems could have an even greater impact on the world along many dimensions.
If TAI is possible, it will probably be developed this decade, in a business and policy and cultural context that's not wildly different from today.
If TAI is possible, it could be used to dramatically accelerate AI R&D, potentially leading to the development of substantially superhuman systems within just a few months or years after TAI.
Powerful AI systems could be extraordinarily destructive if deployed carelessly, both because of new emerging risks and because of existing issues that become much more acute. This could be through misuse of weapons-related capabilities, by disrupting important balances of power in domains like cybersecurity or surveillance, or by any of a number of other means.
Many systems at TAI and beyond, at least under the right circumstances, will be capable of operating more-or-less autonomously for long stretches in pursuit of big-picture, real-world goals. This magnifies these safety challenges.
Alignment - in the narrow sense of making sure AI developers can confidently steer the behavior of the AI systems they deploy - requires some non-trivial effort to get right, and it gets harder as systems get more powerful.
Most of the ideas here ultimately come from outside Anthropic, and while I cite a few sources below, I've been influenced by far more writings and people than I can credit here or even keep track of.
Introducing the Checklist
This lays out what I think we need to do, divided into three chapters, based on the capabilities of our strongest models:
Chapter 1: Preparation
You are here. In this period, our best models aren't yet TAI. In the language of Anthropic's RSP, they're at AI Safety Level 2 (ASL-2), ASL-3, or maybe the early stages of ASL-4. Most of the wor...

2445集单集

#Podcasting Education #The Nonlinear Fund