跳到主要导航 跳到搜索 跳到主要内容

Filter, Correlate, Compress: Training-Free Token Reduction for MLLM Acceleration

  • Yuhang Han
  • , Xuyang Liu
  • , Zihan Zhang
  • , Pengxiang Ding
  • , Junjie Chen
  • , Honggang Chen
  • , Donglin Wang
  • , Qingsen Yan
  • , Siteng Huang
  • Westlake University
  • Sichuan University
  • Johns Hopkins University
  • Zhejiang University

科研成果: 期刊稿件会议文章同行评审

摘要

The quadratic complexity of Multimodal Large Language Models (MLLMs) with respect to context length poses significant computational and memory challenges, hindering their real-world deployment. In the paper, we devise a “filter-correlate-compress” framework to accelerate the MLLM by systematically optimizing multimodal context length during prefilling. The framework first implements FiCoCo-V, a training-free method operating within the vision encoder. It employs a redundancy-based token discard mechanism that uses a novel integrated metric to accurately filter out redundant visual tokens. To mitigate information loss, the framework introduces a correlation-based information recycling mechanism that allows preserved tokens to selectively recycle information from correlated discarded tokens with a self-preserving compression, thereby preventing the dilution of their own core content. The framework’s FiCoCo-L variant further leverages task-aware textual priors to perform token reduction directly within the LLM decoder. Extensive experiments demonstrate that the FiCoCo series effectively accelerates a range of MLLMs, achieves up to 14.7× FLOPs reduction with 93.6% performance retention. Our methods consistently outperform state-of-the-art training-free approaches, showcasing effectiveness and generalizability across model architectures, sizes, and tasks without requiring retraining.

源语言英语
页(从-至)4601
页数1
期刊Proceedings of the AAAI Conference on Artificial Intelligence
40
6
DOI
出版状态已出版 - 2026
活动40th AAAI Conference on Artificial Intelligence, AAAI 2026 - Singapore, 新加坡
期限: 20 1月 202627 1月 2026

指纹

探究 'Filter, Correlate, Compress: Training-Free Token Reduction for MLLM Acceleration' 的科研主题。它们共同构成独一无二的指纹。

引用此