使用Player FM应用程序离线!
Sleep-time Compute: Beyond Inference Scaling at Test-time
Manage episode 480276618 series 3448051
What if your LLM could think ahead—preparing answers before questions are even asked?
In this week's paper read, we dive into a groundbreaking new paper from researchers at Letta, introducing sleep-time compute: a novel technique that lets models do their heavy lifting offline, well before the user query arrives. By predicting likely questions and precomputing key reasoning steps, sleep-time compute dramatically reduces test-time latency and cost—without sacrificing performance.
We explore new benchmarks—Stateful GSM-Symbolic, Stateful AIME, and the multi-query extension of GSM—that show up to 5x lower compute at inference, 2.5x lower cost per query, and up to 18% higher accuracy when scaled.
You’ll also see how this method applies to realistic agent use cases and what makes it most effective.If you care about LLM efficiency, scalability, or cutting-edge research.
Explore more AI research, or sign up to hear the next session live.
Learn more about AI observability and evaluation, join the Arize AI Slack community or get the latest on LinkedIn and X.
59集单集
Manage episode 480276618 series 3448051
What if your LLM could think ahead—preparing answers before questions are even asked?
In this week's paper read, we dive into a groundbreaking new paper from researchers at Letta, introducing sleep-time compute: a novel technique that lets models do their heavy lifting offline, well before the user query arrives. By predicting likely questions and precomputing key reasoning steps, sleep-time compute dramatically reduces test-time latency and cost—without sacrificing performance.
We explore new benchmarks—Stateful GSM-Symbolic, Stateful AIME, and the multi-query extension of GSM—that show up to 5x lower compute at inference, 2.5x lower cost per query, and up to 18% higher accuracy when scaled.
You’ll also see how this method applies to realistic agent use cases and what makes it most effective.If you care about LLM efficiency, scalability, or cutting-edge research.
Explore more AI research, or sign up to hear the next session live.
Learn more about AI observability and evaluation, join the Arize AI Slack community or get the latest on LinkedIn and X.
59集单集
所有剧集
×欢迎使用Player FM
Player FM正在网上搜索高质量的播客,以便您现在享受。它是最好的播客应用程序,适用于安卓、iPhone和网络。注册以跨设备同步订阅。