FiMLink: Enhancing sparse feature matching via FFT-in-Mamba and dynamic learnable Fourier encoding

  • Guancheng Jia
  • , Yao Chen
  • , Ding Ma
  • , Boxiong Sun
  • , Yufei Zha
  • , Peng Zhang

Research output: Contribution to journalArticlepeer-review

Abstract

Modeling long-sequence contextual relationships for sparse feature matching faces a critical trade-off between efficiency and accuracy, as Transformer-based methods inherently suffer from quadratic complexity. To address this challenge, we propose FFT-in-Mamba (FiM)–the first approach to integrate the Fast Fourier Transform (FFT) with Mamba within a dual-branch architecture. This integration synergizes the local spectral precision of FFT with Mamba's capability to capture global context, enabling cross-domain feature learning with linear complexity O(LlogL). Furthermore, we introduce the Dynamic Learnable Fourier Rotation (DLFR) encoding to enhance geometric awareness in sparse sequences. Building on these components, the FiMLink framework innovatively interleaves FiM with shallow Transformer layers to facilitate joint cross-image modeling. Evaluations on the MegaDepth, HPatches, and Aachen datasets demonstrate that FiMLink achieves SOTA-level accuracy (MegaDepth AUC@20° = 80.3%) with an inference speed of 22.4 pairs/sec, while using 42% fewer parameters than MambaGlue.

Original languageEnglish
Article number115459
JournalKnowledge-Based Systems
Volume337
DOIs
StatePublished - 25 Mar 2026

Keywords

  • Fast Fourier transform
  • Image matching
  • Mamba
  • Vision transformer

Fingerprint

Dive into the research topics of 'FiMLink: Enhancing sparse feature matching via FFT-in-Mamba and dynamic learnable Fourier encoding'. Together they form a unique fingerprint.

Cite this