Skip to main navigation Skip to search Skip to main content

Filter, Correlate, Compress: Training-Free Token Reduction for MLLM Acceleration

  • Yuhang Han
  • , Xuyang Liu
  • , Zihan Zhang
  • , Pengxiang Ding
  • , Junjie Chen
  • , Honggang Chen
  • , Donglin Wang
  • , Qingsen Yan
  • , Siteng Huang
  • Westlake University
  • Sichuan University
  • Johns Hopkins University
  • Zhejiang University

Research output: Contribution to journalConference articlepeer-review

Abstract

The quadratic complexity of Multimodal Large Language Models (MLLMs) with respect to context length poses significant computational and memory challenges, hindering their real-world deployment. In the paper, we devise a “filter-correlate-compress” framework to accelerate the MLLM by systematically optimizing multimodal context length during prefilling. The framework first implements FiCoCo-V, a training-free method operating within the vision encoder. It employs a redundancy-based token discard mechanism that uses a novel integrated metric to accurately filter out redundant visual tokens. To mitigate information loss, the framework introduces a correlation-based information recycling mechanism that allows preserved tokens to selectively recycle information from correlated discarded tokens with a self-preserving compression, thereby preventing the dilution of their own core content. The framework’s FiCoCo-L variant further leverages task-aware textual priors to perform token reduction directly within the LLM decoder. Extensive experiments demonstrate that the FiCoCo series effectively accelerates a range of MLLMs, achieves up to 14.7× FLOPs reduction with 93.6% performance retention. Our methods consistently outperform state-of-the-art training-free approaches, showcasing effectiveness and generalizability across model architectures, sizes, and tasks without requiring retraining.

Original languageEnglish
Pages (from-to)4601
Number of pages1
JournalProceedings of the AAAI Conference on Artificial Intelligence
Volume40
Issue number6
DOIs
StatePublished - 2026
Event40th AAAI Conference on Artificial Intelligence, AAAI 2026 - Singapore, Singapore
Duration: 20 Jan 202627 Jan 2026

Fingerprint

Dive into the research topics of 'Filter, Correlate, Compress: Training-Free Token Reduction for MLLM Acceleration'. Together they form a unique fingerprint.

Cite this