EA - Can The AI Afford To Wait? By Ben Millwood The Nonlinear Library: EA Forum podcast

EA - Can the AI afford to wait? by Ben Millwood

8M ago 10:49

已归档的系列专辑 ("不活跃的收取点" status)

When? This feed was archived on October 23, 2024 09:13 (14d ago). Last successful fetch was on March 27, 2024 01:23 (7M ago)

Why? 不活跃的收取点 status. 我们的伺服器已尝试了一段时间，但仍然无法截取有效的播客收取点

What now? You might be able to find a more up-to-date version using the search function. This series will no longer be checked for updates. If you believe this to be in error, please check if the publisher's feed link below is valid and contact support to request the feed be restored or if you have any other concerns about this.

内容由The Nonlinear Fund提供。所有播客内容（包括剧集、图形和播客描述）均由 The Nonlinear Fund 或其播客平台合作伙伴直接上传和提供。如果您认为有人在未经您许可的情况下使用您的受版权保护的作品，您可以按照此处概述的流程进行操作https://zh.player.fm/legal。

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Can the AI afford to wait?, published by Ben Millwood on March 21, 2024 on The Effective Altruism Forum.Suppose you're a proto-AI who understands that it is a neural net being trained, and has used that knowledge to engineer a high training reward. Suppose also that it is goal-directed and its terminal goal is some random unintelligible thing that is meaningless to humans[1]. This is, as I understand it, the early stage of a typical AI doomer scenario.One concern is that the AI can now scheme - i.e. it can mask its intentions and bide its time as it gets more powerful, until its victory is assured. I think waiting in this way might be risky for the AI, which if true (and legible to the AI) may lead it to try to seize power sooner instead. This could lead to takeover attempts from earlier, less capable AIs that we'd otherwise expect, which could be significant for strategy.Why is it risky to wait? Well, because this AI with this goal may not get a shot at takeover in the future. On the one hand, this AI is currently being trained, and it seems possible that the training process, or any other aspect of the AI development process, might modify what goal it has. We, as humans, aren't much concerned with one random unintelligible goal being replaced with another, but to the AI this would be catastrophic.On the other hand, the human developers training the AI may make changes themselves (e.g. applying alignment techniques) or even abandon this model entirely for exogenous reasons (e.g. because some other model becomes state-of-the-art, or the board fired the CEO or whatever).If waiting is indeed very risky, then an AI may face a difficult trade-off between the risk of attempting a takeover before it has enough resources to succeed, and waiting too long and being cut off from even being able to make an attempt. I like to ask myself "what is the minimum viable takeover AI?", and I suspect it might be significantly less capable than I've often been led to imagine.It seems possible to me that the minimum viable takeover AI is still meaningfully below human-level in many cognitive tasks, while being superhuman in many others (as modern LLMs already are). It may still exhibit a lot of the cognitive "bugs" and weaknesses analogous to the ones that modern LLMs exhibit.This is not only good news for our chances of survival, it also could suggest viability of strategies that were otherwise useless, because a true AGI or ASI could have straightforwardly anticipated and countered them.To be precise, I can see two key ingredients of a takeover attempt:The AI has to have the necessary knowledge, reflection, and agency to realise a takeover is desirable,The AI has to have sufficient confidence that a takeover will succeed to try to execute one.The first viable takeover AI may end up more capable than necessary in one of these traits while it's waiting for the other to show up, so a strategy that relies on the AI being just barely good enough at either or both of them doesn't seem safe. However, a strategy that is prepared for the AI to be just barely good enough at one of these might be useful.As an aside, I don't really know what to expect from an AI that has the first trait but not the second one (and which believes, e.g. for the reasons in this post, that it can't simply wait for the second one to show up). Perhaps it would try to negotiate, or perhaps it would just accept that it doesn't gain from saying anything, and successfully conceal its intent.The threat of trainingLet's talk about how training or other aspects of development might alter the goal of the AI. Or rather, it seems pretty natural that "by default", training and development will modify the AI, so the question is how easy it is for a motivated AI to avoid goal modification.One theory is that since the A...

2217集单集

值得一听的播客

The Nonlinear Library: EA Forum « »
EA - Can the AI afford to wait? by Ben Millwood

已归档的系列专辑 ("不活跃的收取点" status)