Artwork

内容由Zeta Alpha提供。所有播客内容(包括剧集、图形和播客描述)均由 Zeta Alpha 或其播客平台合作伙伴直接上传和提供。如果您认为有人在未经您许可的情况下使用您的受版权保护的作品,您可以按照此处概述的流程进行操作https://zh.player.fm/legal
Player FM -播客应用
使用Player FM应用程序离线!

Generating Training Data with Large Language Models w/ Special Guest Marzieh Fadaee

1:16:14
 
分享
 

Manage episode 355037183 series 3446693
内容由Zeta Alpha提供。所有播客内容(包括剧集、图形和播客描述)均由 Zeta Alpha 或其播客平台合作伙伴直接上传和提供。如果您认为有人在未经您许可的情况下使用您的受版权保护的作品,您可以按照此处概述的流程进行操作https://zh.player.fm/legal

Marzieh Fadaee — NLP Research Lead at Zeta Alpha — joins Andrew Yates and Sergi Castella to chat about her work in using large Language Models like GPT-3 to generate domain-specific training data for retrieval models with little-to-no human input. The two papers discussed are "InPars: Data Augmentation for Information Retrieval using Large Language Models" and "Promptagator: Few-shot Dense Retrieval From 8 Examples".

InPars: https://arxiv.org/abs/2202.05144

Promptagator: https://arxiv.org/abs/2209.11755

Timestamps:

00:00 Introduction

02:00 Background and journey of Marzieh Fadaee

03:10 Challenges of leveraging Large LMs in Information Retrieval

05:20 InPars, motivation and method

14:30 Vanilla vs GBQ prompting

24:40 Evaluation and Benchmark

26:30 Baselines

27:40 Main results and takeaways (Table 1, InPars)

35:40 Ablations: prompting, in-domain vs. MSMARCO input documents

40:40 Promptagator overview and main differences with InPars

48:40 Retriever training and filtering in Promptagator

54:37 Main Results (Table 2, Promptagator)

1:02:30 Ablations on consistency filtering (Figure 2, Promptagator)

1:07:39 Is this the magic black-box pipeline for neural retrieval on any documents

1:11:14 Limitations of using LMs for synthetic data

1:13:00 Future directions for this line of research

  continue reading

21集单集

Artwork
icon分享
 
Manage episode 355037183 series 3446693
内容由Zeta Alpha提供。所有播客内容(包括剧集、图形和播客描述)均由 Zeta Alpha 或其播客平台合作伙伴直接上传和提供。如果您认为有人在未经您许可的情况下使用您的受版权保护的作品,您可以按照此处概述的流程进行操作https://zh.player.fm/legal

Marzieh Fadaee — NLP Research Lead at Zeta Alpha — joins Andrew Yates and Sergi Castella to chat about her work in using large Language Models like GPT-3 to generate domain-specific training data for retrieval models with little-to-no human input. The two papers discussed are "InPars: Data Augmentation for Information Retrieval using Large Language Models" and "Promptagator: Few-shot Dense Retrieval From 8 Examples".

InPars: https://arxiv.org/abs/2202.05144

Promptagator: https://arxiv.org/abs/2209.11755

Timestamps:

00:00 Introduction

02:00 Background and journey of Marzieh Fadaee

03:10 Challenges of leveraging Large LMs in Information Retrieval

05:20 InPars, motivation and method

14:30 Vanilla vs GBQ prompting

24:40 Evaluation and Benchmark

26:30 Baselines

27:40 Main results and takeaways (Table 1, InPars)

35:40 Ablations: prompting, in-domain vs. MSMARCO input documents

40:40 Promptagator overview and main differences with InPars

48:40 Retriever training and filtering in Promptagator

54:37 Main Results (Table 2, Promptagator)

1:02:30 Ablations on consistency filtering (Figure 2, Promptagator)

1:07:39 Is this the magic black-box pipeline for neural retrieval on any documents

1:11:14 Limitations of using LMs for synthetic data

1:13:00 Future directions for this line of research

  continue reading

21集单集

所有剧集

×
 
Loading …

欢迎使用Player FM

Player FM正在网上搜索高质量的播客,以便您现在享受。它是最好的播客应用程序,适用于安卓、iPhone和网络。注册以跨设备同步订阅。

 

快速参考指南

边探索边听这个节目
播放