Artwork

内容由Kabir提供。所有播客内容(包括剧集、图形和播客描述)均由 Kabir 或其播客平台合作伙伴直接上传和提供。如果您认为有人在未经您许可的情况下使用您的受版权保护的作品,您可以按照此处概述的流程进行操作https://zh.player.fm/legal
Player FM -播客应用
使用Player FM应用程序离线!

🤖 DeepSeek-R1: Reasoning via Reinforcement Learning

13:22
 
分享
 

Manage episode 463257632 series 3605659
内容由Kabir提供。所有播客内容(包括剧集、图形和播客描述)均由 Kabir 或其播客平台合作伙伴直接上传和提供。如果您认为有人在未经您许可的情况下使用您的受版权保护的作品,您可以按照此处概述的流程进行操作https://zh.player.fm/legal

This episode details the development of DeepSeek-R1, a large language model enhanced for reasoning capabilities through reinforcement learning (RL). Two versions are described: DeepSeek-R1-Zero, trained solely with RL, and DeepSeek-R1, which incorporates a multi-stage training process including cold-start data and supervised fine-tuning to improve readability and performance. DeepSeek-R1 achieves results comparable to OpenAI's o1-1217 model on various reasoning benchmarks. Furthermore, the research explores distilling DeepSeek-R1's reasoning abilities into smaller, more efficient models, achieving strong performance despite the absence of RL in the smaller models. The authors open-source their models and findings to benefit the research community.

Send us a text

Support the show

Podcast:
https://kabir.buzzsprout.com
YouTube:
https://www.youtube.com/@kabirtechdives
Please subscribe and share.

  continue reading

225集单集

Artwork
icon分享
 
Manage episode 463257632 series 3605659
内容由Kabir提供。所有播客内容(包括剧集、图形和播客描述)均由 Kabir 或其播客平台合作伙伴直接上传和提供。如果您认为有人在未经您许可的情况下使用您的受版权保护的作品,您可以按照此处概述的流程进行操作https://zh.player.fm/legal

This episode details the development of DeepSeek-R1, a large language model enhanced for reasoning capabilities through reinforcement learning (RL). Two versions are described: DeepSeek-R1-Zero, trained solely with RL, and DeepSeek-R1, which incorporates a multi-stage training process including cold-start data and supervised fine-tuning to improve readability and performance. DeepSeek-R1 achieves results comparable to OpenAI's o1-1217 model on various reasoning benchmarks. Furthermore, the research explores distilling DeepSeek-R1's reasoning abilities into smaller, more efficient models, achieving strong performance despite the absence of RL in the smaller models. The authors open-source their models and findings to benefit the research community.

Send us a text

Support the show

Podcast:
https://kabir.buzzsprout.com
YouTube:
https://www.youtube.com/@kabirtechdives
Please subscribe and share.

  continue reading

225集单集

Alla avsnitt

×
 
Loading …

欢迎使用Player FM

Player FM正在网上搜索高质量的播客,以便您现在享受。它是最好的播客应用程序,适用于安卓、iPhone和网络。注册以跨设备同步订阅。

 

快速参考指南

边探索边听这个节目
播放