A Novel Multi-Feature Fusion Model Based on Pre-Trained Wav2vec 2.0 for Underwater Acoustic Target Recognition

Zijun Pu; Qunfei Zhang; Yangtao Xue; Peican Zhu; Xiaodong Cui

doi:10.3390/rs16132442

A Novel Multi-Feature Fusion Model Based on Pre-Trained Wav2vec 2.0 for Underwater Acoustic Target Recognition

Zijun Pu, Qunfei Zhang, Yangtao Xue, Peican Zhu, Xiaodong Cui

Northwestern Polytechnical University Xian

科研成果: 期刊稿件 › 文章 › 同行评审

2 引用（Scopus）

摘要

Although recent data-driven Underwater Acoustic Target Recognition (UATR) methods have played a dominant role in marine acoustics, they suffer from complex ocean environments and rather small datasets. To tackle such challenges, researchers have resorted to transfer learning in an effort to fulfill UATR tasks. However, existing pre-trained models are trained on audio speech data, and are not suitable for underwater acoustic data. Therefore, it is necessary to make further optimization on the basis of these models to make them suitable for the UATR task. Here, we propose a novel UATR framework called Attention Layer Supplement Integration (ALSI), which integrates large pre-trained neural networks with customized attention modules for acoustic. Specifically, the ALSI model consists of two important modules, namely Scale ResNet and Residual Hybrid Attention Fusion (RHAF). First, the Scale ResNet module takes the Constant-Q transform feature as input to obtain relatively important frequency information. Next, RHAF takes the temporal feature extracted by wav2vec 2.0 and the frequency feature extracted by Scale ResNet as input and aims to better integrate the time–frequency features with the temporal feature by using the attention mechanism. The RHAF module can help wav2vec 2.0, which is trained on speech data, to better adapt to underwater acoustic data. Finally, the experiments on the ShipsEar dataset demonstrated that our model can achieve recognition accuracy of (Formula presented.). In conclusion, extensive experiments confirm the effectiveness of our model on the UATR task.

源语言	英语
文章编号	2442
期刊	Remote Sensing
卷	16
期	13
DOI	https://doi.org/10.3390/rs16132442
出版状态	已出版 - 7月 2024

联合国可持续发展目标

此成果有助于实现下列可持续发展目标：

访问文件

10.3390/rs16132442

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{db656f1b85034d03a4de13563f3f21a9,

title = "A Novel Multi-Feature Fusion Model Based on Pre-Trained Wav2vec 2.0 for Underwater Acoustic Target Recognition",

abstract = "Although recent data-driven Underwater Acoustic Target Recognition (UATR) methods have played a dominant role in marine acoustics, they suffer from complex ocean environments and rather small datasets. To tackle such challenges, researchers have resorted to transfer learning in an effort to fulfill UATR tasks. However, existing pre-trained models are trained on audio speech data, and are not suitable for underwater acoustic data. Therefore, it is necessary to make further optimization on the basis of these models to make them suitable for the UATR task. Here, we propose a novel UATR framework called Attention Layer Supplement Integration (ALSI), which integrates large pre-trained neural networks with customized attention modules for acoustic. Specifically, the ALSI model consists of two important modules, namely Scale ResNet and Residual Hybrid Attention Fusion (RHAF). First, the Scale ResNet module takes the Constant-Q transform feature as input to obtain relatively important frequency information. Next, RHAF takes the temporal feature extracted by wav2vec 2.0 and the frequency feature extracted by Scale ResNet as input and aims to better integrate the time–frequency features with the temporal feature by using the attention mechanism. The RHAF module can help wav2vec 2.0, which is trained on speech data, to better adapt to underwater acoustic data. Finally, the experiments on the ShipsEar dataset demonstrated that our model can achieve recognition accuracy of (Formula presented.). In conclusion, extensive experiments confirm the effectiveness of our model on the UATR task.",

keywords = "CQT, deep learning, Mel-spectrogram, multi-feature fusion, underwater acoustic target recognition, wav2vec 2.0",

author = "Zijun Pu and Qunfei Zhang and Yangtao Xue and Peican Zhu and Xiaodong Cui",

note = "Publisher Copyright: {\textcopyright} 2024 by the authors.",

year = "2024",

month = jul,

doi = "10.3390/rs16132442",

language = "英语",

volume = "16",

journal = "Remote Sensing",

issn = "2072-4292",

publisher = "Multidisciplinary Digital Publishing Institute (MDPI)",

number = "13",

}

TY - JOUR

T1 - A Novel Multi-Feature Fusion Model Based on Pre-Trained Wav2vec 2.0 for Underwater Acoustic Target Recognition

AU - Pu, Zijun

AU - Zhang, Qunfei

AU - Xue, Yangtao

AU - Zhu, Peican

AU - Cui, Xiaodong

PY - 2024/7

Y1 - 2024/7

N2 - Although recent data-driven Underwater Acoustic Target Recognition (UATR) methods have played a dominant role in marine acoustics, they suffer from complex ocean environments and rather small datasets. To tackle such challenges, researchers have resorted to transfer learning in an effort to fulfill UATR tasks. However, existing pre-trained models are trained on audio speech data, and are not suitable for underwater acoustic data. Therefore, it is necessary to make further optimization on the basis of these models to make them suitable for the UATR task. Here, we propose a novel UATR framework called Attention Layer Supplement Integration (ALSI), which integrates large pre-trained neural networks with customized attention modules for acoustic. Specifically, the ALSI model consists of two important modules, namely Scale ResNet and Residual Hybrid Attention Fusion (RHAF). First, the Scale ResNet module takes the Constant-Q transform feature as input to obtain relatively important frequency information. Next, RHAF takes the temporal feature extracted by wav2vec 2.0 and the frequency feature extracted by Scale ResNet as input and aims to better integrate the time–frequency features with the temporal feature by using the attention mechanism. The RHAF module can help wav2vec 2.0, which is trained on speech data, to better adapt to underwater acoustic data. Finally, the experiments on the ShipsEar dataset demonstrated that our model can achieve recognition accuracy of (Formula presented.). In conclusion, extensive experiments confirm the effectiveness of our model on the UATR task.

AB - Although recent data-driven Underwater Acoustic Target Recognition (UATR) methods have played a dominant role in marine acoustics, they suffer from complex ocean environments and rather small datasets. To tackle such challenges, researchers have resorted to transfer learning in an effort to fulfill UATR tasks. However, existing pre-trained models are trained on audio speech data, and are not suitable for underwater acoustic data. Therefore, it is necessary to make further optimization on the basis of these models to make them suitable for the UATR task. Here, we propose a novel UATR framework called Attention Layer Supplement Integration (ALSI), which integrates large pre-trained neural networks with customized attention modules for acoustic. Specifically, the ALSI model consists of two important modules, namely Scale ResNet and Residual Hybrid Attention Fusion (RHAF). First, the Scale ResNet module takes the Constant-Q transform feature as input to obtain relatively important frequency information. Next, RHAF takes the temporal feature extracted by wav2vec 2.0 and the frequency feature extracted by Scale ResNet as input and aims to better integrate the time–frequency features with the temporal feature by using the attention mechanism. The RHAF module can help wav2vec 2.0, which is trained on speech data, to better adapt to underwater acoustic data. Finally, the experiments on the ShipsEar dataset demonstrated that our model can achieve recognition accuracy of (Formula presented.). In conclusion, extensive experiments confirm the effectiveness of our model on the UATR task.

KW - CQT

KW - deep learning

KW - Mel-spectrogram

KW - multi-feature fusion

KW - underwater acoustic target recognition

KW - wav2vec 2.0

UR - http://www.scopus.com/inward/record.url?scp=85198562566&partnerID=8YFLogxK

U2 - 10.3390/rs16132442

DO - 10.3390/rs16132442

M3 - 文章

AN - SCOPUS:85198562566

SN - 2072-4292

VL - 16

JO - Remote Sensing

JF - Remote Sensing

IS - 13

M1 - 2442

ER -

A Novel Multi-Feature Fusion Model Based on Pre-Trained Wav2vec 2.0 for Underwater Acoustic Target Recognition

摘要

联合国可持续发展目标

访问文件

其它文件与链接

指纹

引用此