Artwork

内容由LessWrong提供。所有播客内容(包括剧集、图形和播客描述)均由 LessWrong 或其播客平台合作伙伴直接上传和提供。如果您认为有人在未经您许可的情况下使用您的受版权保护的作品,您可以按照此处概述的流程进行操作https://zh.player.fm/legal
Player FM -播客应用
使用Player FM应用程序离线!

“0. CAST: Corrigibility as Singular Target” by Max Harms

19:40
 
分享
 

Manage episode 432959396 series 3364758
内容由LessWrong提供。所有播客内容(包括剧集、图形和播客描述)均由 LessWrong 或其播客平台合作伙伴直接上传和提供。如果您认为有人在未经您许可的情况下使用您的受版权保护的作品,您可以按照此处概述的流程进行操作https://zh.player.fm/legal
Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.What the heck is up with “corrigibility”? For most of my career, I had a sense that it was a grab-bag of properties that seemed nice in theory but hard to get in practice, perhaps due to being incompatible with agency.
Then, last year, I spent some time revisiting my perspective, and I concluded that I had been deeply confused by what corrigibility even was. I now think that corrigibility is a single, intuitive property, which people can learn to emulate without too much work and which is deeply compatible with agency. Furthermore, I expect that even with prosaic training methods, there's some chance of winding up with an AI agent that's inclined to become more corrigible over time, rather than less (as long as the people who built it understand corrigibility and want that agent [...]
---
Outline:
(07:30) Overview
(07:33) 1. The CAST Strategy
(08:15) 2. Corrigibility Intuition (Coming Saturday)
(08:49) 3a. Towards Formal Corrigibility (Coming Sunday)
(09:27) 3. Formal (Faux) Corrigibility ← the mathy one (Also Sunday)
(10:12) 4. Existing Writing on Corrigibility (Coming Monday)
(10:33) 5. Open Corrigibility Questions (Also Monday)
(10:58) Bibliography and Miscellany
---
First published:
June 7th, 2024
Source:
https://www.lesswrong.com/posts/NQK8KHSrZRF5erTba/0-cast-corrigibility-as-singular-target-1
---
Narrated by TYPE III AUDIO.
  continue reading

358集单集

Artwork
icon分享
 
Manage episode 432959396 series 3364758
内容由LessWrong提供。所有播客内容(包括剧集、图形和播客描述)均由 LessWrong 或其播客平台合作伙伴直接上传和提供。如果您认为有人在未经您许可的情况下使用您的受版权保护的作品,您可以按照此处概述的流程进行操作https://zh.player.fm/legal
Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.What the heck is up with “corrigibility”? For most of my career, I had a sense that it was a grab-bag of properties that seemed nice in theory but hard to get in practice, perhaps due to being incompatible with agency.
Then, last year, I spent some time revisiting my perspective, and I concluded that I had been deeply confused by what corrigibility even was. I now think that corrigibility is a single, intuitive property, which people can learn to emulate without too much work and which is deeply compatible with agency. Furthermore, I expect that even with prosaic training methods, there's some chance of winding up with an AI agent that's inclined to become more corrigible over time, rather than less (as long as the people who built it understand corrigibility and want that agent [...]
---
Outline:
(07:30) Overview
(07:33) 1. The CAST Strategy
(08:15) 2. Corrigibility Intuition (Coming Saturday)
(08:49) 3a. Towards Formal Corrigibility (Coming Sunday)
(09:27) 3. Formal (Faux) Corrigibility ← the mathy one (Also Sunday)
(10:12) 4. Existing Writing on Corrigibility (Coming Monday)
(10:33) 5. Open Corrigibility Questions (Also Monday)
(10:58) Bibliography and Miscellany
---
First published:
June 7th, 2024
Source:
https://www.lesswrong.com/posts/NQK8KHSrZRF5erTba/0-cast-corrigibility-as-singular-target-1
---
Narrated by TYPE III AUDIO.
  continue reading

358集单集

모든 에피소드

×
 
Loading …

欢迎使用Player FM

Player FM正在网上搜索高质量的播客,以便您现在享受。它是最好的播客应用程序,适用于安卓、iPhone和网络。注册以跨设备同步订阅。

 

快速参考指南