使用Player FM应用程序离线!
Why focus on schemers in particular? (Sections 1.3-1.4 of "Scheming AIs")
Manage episode 385590246 series 3402048
This is sections 1.3-1.4 of my report “Scheming AIs: Will AIs fake alignment during training in order to get power?”
Text of the report here: https://arxiv.org/abs/2311.08379
Summary of the report here: https://joecarlsmith.com/2023/11/15/new-report-scheming-ais-will-ais-fake-alignment-during-training-in-order-to-get-power
Audio summary here: https://joecarlsmithaudio.buzzsprout.com/2034731/13969977-introduction-and-summary-of-scheming-ais-will-ais-fake-alignment-during-training-in-order-to-get-power
章节
1. Why focus on schemers in particular? (Sections 1.3-1.4 of "Scheming AIs") (00:00:00)
2. 1.3 Why focus on schemers in particular? (00:00:36)
3. 1.3.1 The type of misalignment I’m most worried about (00:01:14)
4. 1.3.2 Contrast with reward-on-the-episode seekers (00:04:27)
5. 1.3.2.1 Responsiveness to honest tests (00:04:46)
6. 1.3.2.2 Temporal scope and general “ambition” (00:07:54)
7. 1.3.2.3 Sandbagging and “early undermining” (00:11:17)
8. 1.3.3 Contrast with models that aren’t playing the training game (00:17:13)
9. 1.3.4 Non-schemers with schemer-like traits (00:23:13)
10. 1.3.5 Mixed models (00:25:20)
11. 1.4 Are theoretical arguments about this topic even useful? (00:28:35)
63集单集
Manage episode 385590246 series 3402048
This is sections 1.3-1.4 of my report “Scheming AIs: Will AIs fake alignment during training in order to get power?”
Text of the report here: https://arxiv.org/abs/2311.08379
Summary of the report here: https://joecarlsmith.com/2023/11/15/new-report-scheming-ais-will-ais-fake-alignment-during-training-in-order-to-get-power
Audio summary here: https://joecarlsmithaudio.buzzsprout.com/2034731/13969977-introduction-and-summary-of-scheming-ais-will-ais-fake-alignment-during-training-in-order-to-get-power
章节
1. Why focus on schemers in particular? (Sections 1.3-1.4 of "Scheming AIs") (00:00:00)
2. 1.3 Why focus on schemers in particular? (00:00:36)
3. 1.3.1 The type of misalignment I’m most worried about (00:01:14)
4. 1.3.2 Contrast with reward-on-the-episode seekers (00:04:27)
5. 1.3.2.1 Responsiveness to honest tests (00:04:46)
6. 1.3.2.2 Temporal scope and general “ambition” (00:07:54)
7. 1.3.2.3 Sandbagging and “early undermining” (00:11:17)
8. 1.3.3 Contrast with models that aren’t playing the training game (00:17:13)
9. 1.3.4 Non-schemers with schemer-like traits (00:23:13)
10. 1.3.5 Mixed models (00:25:20)
11. 1.4 Are theoretical arguments about this topic even useful? (00:28:35)
63集单集
所有剧集
×









1 Arguments for/against scheming that focus on the path SGD takes (Section 3 of "Scheming AIs") 29:03
欢迎使用Player FM
Player FM正在网上搜索高质量的播客,以便您现在享受。它是最好的播客应用程序,适用于安卓、iPhone和网络。注册以跨设备同步订阅。