DFF-Mono:A lightweight self-supervised monocular depth estimation method based on dual-branch feature fusion

  • Han Zhang
  • , Xiaojun Yu
  • , Hengrong Guo
  • , Liang Shen
  • , Zeming Fan

Research output: Contribution to journalArticlepeer-review

Abstract

Monocular depth estimation is one of the fundamental challenges in 3D scene understanding, particularly when operating within the constraints of unsupervised learning paradigms. While existing self-supervised methods avoid the dependency on annotated depth labels, their high computational complexity significantly hinders deployment on resource-constrained mobile platforms. To address this issue, we propose a parameter-efficient framework, namely, DFF-Mono, that synergistically optimizes depth estimation accuracy with computational efficiency. Specifically, the proposed DFF-Mono framework incorporates three main components. While a lightweight encoder that integrates Dual-Kernel Dilated Convolution (DKDC) modules with Dual-branch Feature Fusion (DFF) architecture is proposed for multi-scale feature encoding, a novel Attention-guided Large Kernel Inception (ALKI) module with multi-branch large-kernel convolution is devised to leverage local–global attention guidance for efficient local feature extraction. As a complement, a frequency-domain optimization strategy is also employed to enhance training efficiency. The strategy is achieved via adaptive Gaussian low-pass filtering, without introducing any additional network parameters. Extensive experiments are conducted to verify the effectiveness of the proposed method, and results demonstrate that DFF-Mono is superior over those existing approaches across standard benchmarks. Notably, DFF-Mono reduces model parameters by 23% compared to current state-of-the-art solutions while consistently achieving superior depth accuracy.

Original languageEnglish
Article number103167
JournalDisplays
Volume90
DOIs
StatePublished - Dec 2025

Keywords

  • 3D scene understanding
  • Convolution
  • Monocular depth estimation
  • Transformer

Fingerprint

Dive into the research topics of 'DFF-Mono:A lightweight self-supervised monocular depth estimation method based on dual-branch feature fusion'. Together they form a unique fingerprint.

Cite this