Abstract
Forward-looking sonar (FLS) image segmentation is essential for underwater exploration with remaining challenges including low contrast, ambient noise, and complex backgrounds, which both existing traditional and deep learning-based methods fail to address effectively. This paper presents TriEncoderNet, a novel model that simultaneously extracts local, global, and edge-related features through three parallel encoders. Specifically, the model integrates a convolutional neural network (CNN) for local feature extraction, a transformer for global context modeling, and a histogram of oriented gradients (HOG) encoder for edge and shape detection. The key innovations of TriEncoderNet include the CrossFusionTransformer (CFT) module, which effectively integrates local and global features to capture both fine details and comprehensive context, and the HOG attention gate (HAG) module, which enhances edge detection and preserves semantic consistency across diverse feature types. Additionally, TriEncoderNet introduces the hierarchical efficient transformer (HETransformer) with a lightweight multi-head self-attention mechanism to reduce computational overhead while maintaining global context modeling capability. Experimental results on the marine debris dataset and UATD dataset demonstrate the superior performance of TriEncoderNet. Specifically, it achieves an mIoU of 0.793 and mAP of 0.916 on the marine debris dataset, and an mIoU of 0.582 and mAP of 0.687 on the UATD Dataset, outperforming state-of-the-art methods in both segmentation accuracy and robustness in challenging underwater environments.
| Original language | English |
|---|---|
| Article number | 2295 |
| Journal | Journal of Marine Science and Engineering |
| Volume | 13 |
| Issue number | 12 |
| DOIs | |
| State | Published - Dec 2025 |
UN SDGs
This output contributes to the following UN Sustainable Development Goals (SDGs)
-
SDG 14 Life Below Water
Keywords
- feature fusion
- forward looking sonar (FLS)
- sonar image segmentation
- transformer
Fingerprint
Dive into the research topics of 'TriEncoderNet: Multi-Stage Fusion of CNN, Transformer, and HOG Features for Forward-Looking Sonar Image Segmentation'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver