Abstract
Modeling long-sequence contextual relationships for sparse feature matching faces a critical trade-off between efficiency and accuracy, as Transformer-based methods inherently suffer from quadratic complexity. To address this challenge, we propose FFT-in-Mamba (FiM)–the first approach to integrate the Fast Fourier Transform (FFT) with Mamba within a dual-branch architecture. This integration synergizes the local spectral precision of FFT with Mamba's capability to capture global context, enabling cross-domain feature learning with linear complexity O(LlogL). Furthermore, we introduce the Dynamic Learnable Fourier Rotation (DLFR) encoding to enhance geometric awareness in sparse sequences. Building on these components, the FiMLink framework innovatively interleaves FiM with shallow Transformer layers to facilitate joint cross-image modeling. Evaluations on the MegaDepth, HPatches, and Aachen datasets demonstrate that FiMLink achieves SOTA-level accuracy (MegaDepth AUC@20° = 80.3%) with an inference speed of 22.4 pairs/sec, while using 42% fewer parameters than MambaGlue.
| Original language | English |
|---|---|
| Article number | 115459 |
| Journal | Knowledge-Based Systems |
| Volume | 337 |
| DOIs | |
| State | Published - 25 Mar 2026 |
Keywords
- Fast Fourier transform
- Image matching
- Mamba
- Vision transformer
Fingerprint
Dive into the research topics of 'FiMLink: Enhancing sparse feature matching via FFT-in-Mamba and dynamic learnable Fourier encoding'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver