#2: Large Language Models Can Self-Improve Gladstone AI podcast

#2: Large Language Models Can Self-Improve

2y ago 33:38

内容由Jeremie Harris提供。所有播客内容（包括剧集、图形和播客描述）均由 Jeremie Harris 或其播客平台合作伙伴直接上传和提供。如果您认为有人在未经您许可的情况下使用您的受版权保护的作品，您可以按照此处概述的流程进行操作https://zh.player.fm/legal。

Google recently announced a significant breakthrough: a new Language Model Self-Improvement (LMSI) system that makes it possible for big language models to improve their own performance on many tasks without using any additional labeled data. In this post, and its accompanying podcast, we’ll take a look at LMSI to understand why it’s such a big deal.

When applying LMSI to a 540B parameter PaLM model, the Google researchers achieved state-of-the-art results across a variety of arithmetic reasoning, commonsense reasoning, and natural language inference tasks.

The LMSI system allows a language model to self-improve in 3 steps:

First, you give the system some questions like “Stefan goes to a restaurant with his family. They order an appetizer that costs $10 and 4 entrees that are $20 each. If they tip 20% of the total, what is the total amount of money that they spend?”
Then, you ask the language model to explain the answer to the question in 32 different ways. For example, one explanation could be “The appetizer costs $10. The entrees cost 4 * $20 = $80. The tip is 20% of the total, so it is 20% of the $90 they have spent. The tip is 0.2 * 90 = $18. The total they spent is $90 + $18 = $108. The answer is 108.”
Finally, the system picks the explanations with the most common answer and trains the language model on these explanations. For example, if 16 out of 32 explanations give $108 as the answer, and the other explanations have a mix of different answers, then the system will pick the explanations that gave $108 as the answer.

This approach lets an LMSI-augmented language model significantly improve its own performance and achieve state-of-the-art results on reasoning problems.

The authors found that the LMSI system makes language models much more powerful. When they fine-tuned a small language model with LMSI, they found that the model could answer questions better than language models that are 9 times bigger, that didn’t use LMSI.

Industry Context

With only some text-based questions, large language models like PaLM fine-tuned with the LMSI system were able to outperform existing state-of-the-art benchmarks that use more complex reasoning methods and/or ground truth labels. Small language models fine-tuned using LMSI were also able to outperform models that were 9 times larger and did not use LMSI.

This example shows that we are still discovering ways to improve large language models, without increasing model or dataset size, and that it is possible to improve language models without any labeled data. Since LMSI enables small language models to work better than large models without LMSI, malicious uses that leverage LMSI are less expensive to access than they were before.

2集单集

The LMSI system allows a language model to self-improve in 3 steps:

First, you give the system some questions like “Stefan goes to a restaurant with his family. They order an appetizer that costs $10 and 4 entrees that are $20 each. If they tip 20% of the total, what is the total amount of money that they spend?”
Then, you ask the language model to explain the answer to the question in 32 different ways. For example, one explanation could be “The appetizer costs $10. The entrees cost 4 * $20 = $80. The tip is 20% of the total, so it is 20% of the $90 they have spent. The tip is 0.2 * 90 = $18. The total they spent is $90 + $18 = $108. The answer is 108.”
Finally, the system picks the explanations with the most common answer and trains the language model on these explanations. For example, if 16 out of 32 explanations give $108 as the answer, and the other explanations have a mix of different answers, then the system will pick the explanations that gave $108 as the answer.

This approach lets an LMSI-augmented language model significantly improve its own performance and achieve state-of-the-art results on reasoning problems.

值得一听的播客

Gladstone AI Podcast »
#2: Large Language Models Can Self-Improve

Industry Context