TY - JOUR
T1 - Constructing a Multi-Modal Based Underwater Acoustic Target Recognition Method with a Pre-Trained Language-Audio Model
AU - Fu, Bowen
AU - Nie, Jiangtao
AU - Wei, Wei
AU - Zhang, Lei
N1 - Publisher Copyright:
© 1980-2012 IEEE.
PY - 2025
Y1 - 2025
N2 - Underwater acoustic target recognition (UATR) aims to accurately identify radiated acoustic signals from ships in complex maritime environments. The challenges of this task lay in how to explore discriminative representation from complex and limited acoustic samples. Recently, various deep learning-based UATR methods have been proposed. However, their performance on real sonar-collected signals remains restricted. On one hand, most methods currently adopt different representation extraction strategies to extract features from acoustic signals such as time-frequency (T-F) representation, wave representation, and joint representation. However, the limited feature representation capability and simple feature fusion strategies often limit the recognition performance improvement. On the other hand, they often overlook the knowledge gains brought by pre-trained models and the extraction of multifeature semantic correlation knowledge. This leads to unsatisfactory performance and even overfitting issues. To mitigate these issues, this article proposes a multifeature UATR (MF-UATR) method. It introduces a strongly generalized multi-modal pre-trained language-audio model and contrastive learning-based feature-level fusion strategy to semantically guide and fuse multiple features. This strategy facilitates the model in learning prior knowledge and the semantic correlations between features thereby improving recognition performance. In addition, we also considered the few-shot scenarios with extremely limited data, in which a multi-modal few-shot UATR (MMFS-UATR) scheme is proposed. It efficiently completes the few-shot UATR (FS-UATR) task by combining parameter-efficient fine-tuning (PEFT) techniques, semantic supervision strategy, and pre-trained MF-UATR. Extensive experiments on two public datasets, DeepShip and ShipsEar, demonstrate that the proposed frameworks achieve optimal target recognition performance under regular and few-shot settings.
AB - Underwater acoustic target recognition (UATR) aims to accurately identify radiated acoustic signals from ships in complex maritime environments. The challenges of this task lay in how to explore discriminative representation from complex and limited acoustic samples. Recently, various deep learning-based UATR methods have been proposed. However, their performance on real sonar-collected signals remains restricted. On one hand, most methods currently adopt different representation extraction strategies to extract features from acoustic signals such as time-frequency (T-F) representation, wave representation, and joint representation. However, the limited feature representation capability and simple feature fusion strategies often limit the recognition performance improvement. On the other hand, they often overlook the knowledge gains brought by pre-trained models and the extraction of multifeature semantic correlation knowledge. This leads to unsatisfactory performance and even overfitting issues. To mitigate these issues, this article proposes a multifeature UATR (MF-UATR) method. It introduces a strongly generalized multi-modal pre-trained language-audio model and contrastive learning-based feature-level fusion strategy to semantically guide and fuse multiple features. This strategy facilitates the model in learning prior knowledge and the semantic correlations between features thereby improving recognition performance. In addition, we also considered the few-shot scenarios with extremely limited data, in which a multi-modal few-shot UATR (MMFS-UATR) scheme is proposed. It efficiently completes the few-shot UATR (FS-UATR) task by combining parameter-efficient fine-tuning (PEFT) techniques, semantic supervision strategy, and pre-trained MF-UATR. Extensive experiments on two public datasets, DeepShip and ShipsEar, demonstrate that the proposed frameworks achieve optimal target recognition performance under regular and few-shot settings.
KW - Few-shot learning
KW - language-audio models
KW - multifeature fusion
KW - underwater acoustic target recognition (UATR)
UR - http://www.scopus.com/inward/record.url?scp=86000385544&partnerID=8YFLogxK
U2 - 10.1109/TGRS.2024.3515171
DO - 10.1109/TGRS.2024.3515171
M3 - 文章
AN - SCOPUS:86000385544
SN - 0196-2892
VL - 63
JO - IEEE Transactions on Geoscience and Remote Sensing
JF - IEEE Transactions on Geoscience and Remote Sensing
M1 - 4200414
ER -