使用Player FM应用程序离线!
A recipe for frontier model post-training
Fetch error
Hmmm there seems to be a problem fetching this series right now. Last successful fetch was on October 02, 2024 13:35 ()
What now? This series will be checked again in the next hour. If you believe it should be working, please verify the publisher's feed link below is valid and includes actual episode links. You can contact support to request the feed be immediately fetched.
Manage episode 432955640 series 3590272
Apple, Meta, and Nvidia all agree -- synthetic data, iterative training, human preference labels, and lots of filtering.
This is AI generated audio with Python and 11Labs.
Source code: https://github.com/natolambert/interconnects-tools
Original post: https://www.interconnects.ai/p/frontier-model-post-training
00:00 Llama 3.1 post-training and the new normal for RLHF
01:18 A new standard pipeline
01:45 Human preference data
02:59 Scaling RLHF
05:03 Synthetic data
06:10 The new normal
06:51 Data quality is king
07:18 Apple confirms the new normal
Fig 1: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/frontier-rlhf/img_018.png
Fig 2: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/frontier-rlhf/img_020.png
Fig 3: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/frontier-rlhf/img_031.png
Fig 4: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/frontier-rlhf/img_033.png
Fig 5: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/frontier-rlhf/img_035.png
58集单集
Fetch error
Hmmm there seems to be a problem fetching this series right now. Last successful fetch was on October 02, 2024 13:35 ()
What now? This series will be checked again in the next hour. If you believe it should be working, please verify the publisher's feed link below is valid and includes actual episode links. You can contact support to request the feed be immediately fetched.
Manage episode 432955640 series 3590272
Apple, Meta, and Nvidia all agree -- synthetic data, iterative training, human preference labels, and lots of filtering.
This is AI generated audio with Python and 11Labs.
Source code: https://github.com/natolambert/interconnects-tools
Original post: https://www.interconnects.ai/p/frontier-model-post-training
00:00 Llama 3.1 post-training and the new normal for RLHF
01:18 A new standard pipeline
01:45 Human preference data
02:59 Scaling RLHF
05:03 Synthetic data
06:10 The new normal
06:51 Data quality is king
07:18 Apple confirms the new normal
Fig 1: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/frontier-rlhf/img_018.png
Fig 2: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/frontier-rlhf/img_020.png
Fig 3: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/frontier-rlhf/img_031.png
Fig 4: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/frontier-rlhf/img_033.png
Fig 5: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/frontier-rlhf/img_035.png
58集单集
Tutti gli episodi
×欢迎使用Player FM
Player FM正在网上搜索高质量的播客,以便您现在享受。它是最好的播客应用程序,适用于安卓、iPhone和网络。注册以跨设备同步订阅。