Artwork

内容由Yannic Kilcher提供。所有播客内容(包括剧集、图形和播客描述)均由 Yannic Kilcher 或其播客平台合作伙伴直接上传和提供。如果您认为有人在未经您许可的情况下使用您的受版权保护的作品,您可以按照此处概述的流程进行操作https://zh.player.fm/legal
Player FM -播客应用
使用Player FM应用程序离线!

Author Interview - ACCEL: Evolving Curricula with Regret-Based Environment Design

57:45
 
分享
 

Manage episode 327328766 series 2974171
内容由Yannic Kilcher提供。所有播客内容(包括剧集、图形和播客描述)均由 Yannic Kilcher 或其播客平台合作伙伴直接上传和提供。如果您认为有人在未经您许可的情况下使用您的受版权保护的作品,您可以按照此处概述的流程进行操作https://zh.player.fm/legal

#ai #accel #evolution

This is an interview with the authors Jack Parker-Holder and Minqi Jiang.

Original Paper Review Video: https://www.youtube.com/watch?v=povBD...

Automatic curriculum generation is one of the most promising avenues for Reinforcement Learning today. Multiple approaches have been proposed, each with their own set of advantages and drawbacks. This paper presents ACCEL, which takes the next step into the direction of constructing curricula for multi-capable agents. ACCEL combines the adversarial adaptiveness of regret-based sampling methods with the capabilities of level-editing, usually found in Evolutionary Methods.

OUTLINE:

0:00 - Intro

1:00 - Start of interview

4:45 - How did you get into this field?

8:10 - What is minimax regret?

11:45 - What levels does the regret objective select?

14:20 - Positive value loss (correcting my mistakes)

21:05 - Why is the teacher not learned?

24:45 - How much domain-specific knowledge is needed?

29:30 - What problems is this applicable to?

33:15 - Single agent vs population of agents

37:25 - Measuring and balancing level difficulty

40:35 - How does generalization emerge?

42:50 - Diving deeper into the experimental results

47:00 - What are the unsolved challenges in the field?

50:00 - Where do we go from here?

Website: https://accelagent.github.io

Paper: https://arxiv.org/abs/2203.01302

ICLR Workshop: https://sites.google.com/view/aloe2022

Book on topic: https://www.oreilly.com/radar/open-en...

Abstract:

It remains a significant challenge to train generally capable agents with reinforcement learning (RL). A promising avenue for improving the robustness of RL agents is through the use of curricula. One such class of methods frames environment design as a game between a student and a teacher, using regret-based objectives to produce environment instantiations (or levels) at the frontier of the student agent's capabilities. These methods benefit from their generality, with theoretical guarantees at equilibrium, yet they often struggle to find effective levels in challenging design spaces. By contrast, evolutionary approaches seek to incrementally alter environment complexity, resulting in potentially open-ended learning, but often rely on domain-specific heuristics and vast amounts of computational resources. In this paper we propose to harness the power of evolution in a principled, regret-based curriculum. Our approach, which we call Adversarially Compounding Complexity by Editing Levels (ACCEL), seeks to constantly produce levels at the frontier of an agent's capabilities, resulting in curricula that start simple but become increasingly complex. ACCEL maintains the theoretical benefits of prior regret-based methods, while providing significant empirical gains in a diverse set of environments. An interactive version of the paper is available at this http URL.

Authors: Jack Parker-Holder, Minqi Jiang, Michael Dennis, Mikayel Samvelyan, Jakob Foerster, Edward Grefenstette, Tim Rocktäschel

Links:

TabNine Code Completion (Referral): http://bit.ly/tabnine-yannick

YouTube: https://www.youtube.com/c/yannickilcher

Twitter: https://twitter.com/ykilcher

Discord: https://discord.gg/4H8xxDF

BitChute: https://www.bitchute.com/channel/yann...

LinkedIn: https://www.linkedin.com/in/ykilcher

BiliBili: https://space.bilibili.com/2017636191

If you want to support me, the best thing to do is to share out the content :)

  continue reading

177集单集

Artwork
icon分享
 
Manage episode 327328766 series 2974171
内容由Yannic Kilcher提供。所有播客内容(包括剧集、图形和播客描述)均由 Yannic Kilcher 或其播客平台合作伙伴直接上传和提供。如果您认为有人在未经您许可的情况下使用您的受版权保护的作品,您可以按照此处概述的流程进行操作https://zh.player.fm/legal

#ai #accel #evolution

This is an interview with the authors Jack Parker-Holder and Minqi Jiang.

Original Paper Review Video: https://www.youtube.com/watch?v=povBD...

Automatic curriculum generation is one of the most promising avenues for Reinforcement Learning today. Multiple approaches have been proposed, each with their own set of advantages and drawbacks. This paper presents ACCEL, which takes the next step into the direction of constructing curricula for multi-capable agents. ACCEL combines the adversarial adaptiveness of regret-based sampling methods with the capabilities of level-editing, usually found in Evolutionary Methods.

OUTLINE:

0:00 - Intro

1:00 - Start of interview

4:45 - How did you get into this field?

8:10 - What is minimax regret?

11:45 - What levels does the regret objective select?

14:20 - Positive value loss (correcting my mistakes)

21:05 - Why is the teacher not learned?

24:45 - How much domain-specific knowledge is needed?

29:30 - What problems is this applicable to?

33:15 - Single agent vs population of agents

37:25 - Measuring and balancing level difficulty

40:35 - How does generalization emerge?

42:50 - Diving deeper into the experimental results

47:00 - What are the unsolved challenges in the field?

50:00 - Where do we go from here?

Website: https://accelagent.github.io

Paper: https://arxiv.org/abs/2203.01302

ICLR Workshop: https://sites.google.com/view/aloe2022

Book on topic: https://www.oreilly.com/radar/open-en...

Abstract:

It remains a significant challenge to train generally capable agents with reinforcement learning (RL). A promising avenue for improving the robustness of RL agents is through the use of curricula. One such class of methods frames environment design as a game between a student and a teacher, using regret-based objectives to produce environment instantiations (or levels) at the frontier of the student agent's capabilities. These methods benefit from their generality, with theoretical guarantees at equilibrium, yet they often struggle to find effective levels in challenging design spaces. By contrast, evolutionary approaches seek to incrementally alter environment complexity, resulting in potentially open-ended learning, but often rely on domain-specific heuristics and vast amounts of computational resources. In this paper we propose to harness the power of evolution in a principled, regret-based curriculum. Our approach, which we call Adversarially Compounding Complexity by Editing Levels (ACCEL), seeks to constantly produce levels at the frontier of an agent's capabilities, resulting in curricula that start simple but become increasingly complex. ACCEL maintains the theoretical benefits of prior regret-based methods, while providing significant empirical gains in a diverse set of environments. An interactive version of the paper is available at this http URL.

Authors: Jack Parker-Holder, Minqi Jiang, Michael Dennis, Mikayel Samvelyan, Jakob Foerster, Edward Grefenstette, Tim Rocktäschel

Links:

TabNine Code Completion (Referral): http://bit.ly/tabnine-yannick

YouTube: https://www.youtube.com/c/yannickilcher

Twitter: https://twitter.com/ykilcher

Discord: https://discord.gg/4H8xxDF

BitChute: https://www.bitchute.com/channel/yann...

LinkedIn: https://www.linkedin.com/in/ykilcher

BiliBili: https://space.bilibili.com/2017636191

If you want to support me, the best thing to do is to share out the content :)

  continue reading

177集单集

所有剧集

×
 
Loading …

欢迎使用Player FM

Player FM正在网上搜索高质量的播客,以便您现在享受。它是最好的播客应用程序,适用于安卓、iPhone和网络。注册以跨设备同步订阅。

 

快速参考指南