Artwork

内容由Arize AI提供。所有播客内容(包括剧集、图形和播客描述)均由 Arize AI 或其播客平台合作伙伴直接上传和提供。如果您认为有人在未经您许可的情况下使用您的受版权保护的作品,您可以按照此处概述的流程进行操作https://zh.player.fm/legal
Player FM -播客应用
使用Player FM应用程序离线!

Accurate KV Cache Quantization with Outlier Tokens Tracing

25:11
 
分享
 

Manage episode 486845178 series 3448051
内容由Arize AI提供。所有播客内容(包括剧集、图形和播客描述)均由 Arize AI 或其播客平台合作伙伴直接上传和提供。如果您认为有人在未经您许可的情况下使用您的受版权保护的作品,您可以按照此处概述的流程进行操作https://zh.player.fm/legal

We discuss Accurate KV Cache Quantization with Outlier Tokens Tracing, a deep dive into improving the efficiency of LLM inference. The authors enhance KV Cache quantization, a technique for reducing memory and compute costs during inference, by introducing a method to identify and exclude outlier tokens that hurt quantization accuracy, striking a better balance between efficiency and performance.

Learn more about AI observability and evaluation, join the Arize AI Slack community or get the latest on LinkedIn and X.

  continue reading

59集单集

Artwork
icon分享
 
Manage episode 486845178 series 3448051
内容由Arize AI提供。所有播客内容(包括剧集、图形和播客描述)均由 Arize AI 或其播客平台合作伙伴直接上传和提供。如果您认为有人在未经您许可的情况下使用您的受版权保护的作品,您可以按照此处概述的流程进行操作https://zh.player.fm/legal

We discuss Accurate KV Cache Quantization with Outlier Tokens Tracing, a deep dive into improving the efficiency of LLM inference. The authors enhance KV Cache quantization, a technique for reducing memory and compute costs during inference, by introducing a method to identify and exclude outlier tokens that hurt quantization accuracy, striking a better balance between efficiency and performance.

Learn more about AI observability and evaluation, join the Arize AI Slack community or get the latest on LinkedIn and X.

  continue reading

59集单集

Alle episoder

×
 
Loading …

欢迎使用Player FM

Player FM正在网上搜索高质量的播客,以便您现在享受。它是最好的播客应用程序,适用于安卓、iPhone和网络。注册以跨设备同步订阅。

 

快速参考指南

版权2025 | 隐私政策 | 服务条款 | | 版权
边探索边听这个节目
播放