TY - GEN
T1 - Fine-grained Artificial Neurons in Audio-transformers for Disentangling Neural Auditory Encoding
AU - Zhou, Mengyue
AU - Liu, Xu
AU - Liu, David
AU - Wu, Zihao
AU - Liu, Zhengliang
AU - Zhao, Lin
AU - Zhu, Dajiang
AU - Guo, Lei
AU - Han, Junwei
AU - Liu, Tianming
AU - Hu, Xintao
N1 - Publisher Copyright:
© 2023 Association for Computational Linguistics.
PY - 2023
Y1 - 2023
N2 - The Wav2Vec and its variants have achieved unprecedented success in computational auditory and speech processing. Meanwhile, neural encoding studies that link representations of Wav2Vec to brain activities have provided novel insights into how auditory and speech processing unfold in the human brain. Most existing neural encoding studies treat each transformer encoding layer in Wav2Vec as a single artificial neuron (AN). That is, the layer-level embeddings are used to predict neural responses. The layer-level embedding aggregates multiple types of contextual attention captured by multi-head self-attention (MSA). Thus, the layer-level ANs lack fine-granularity for neural encoding. To address this limitation, we define the elementary units, i.e., each hidden dimension, as neuron-level ANs in Wav2Vec2.0, quantify their temporal responses, and couple those ANs with their biological-neuron (BN) counterparts in the human brain. Our experimental results demonstrated that: 1) The proposed neuron-level ANs carry meaningful neurolinguistic information; 2) Those ANs anchor to their BN signatures; 3) The AN-BN anchoring patterns are interpretable from a neurolinguistic perspective. More importantly, our results suggest an intermediate stage in both the computational representation in Wav2Vec2.0 and the cortical representation in the brain. Our study validates the fine-grained ANs in Wav2Vec2.0, which may serve as a novel and general strategy to link transformer-based deep learning models to neural responses for probing sensory processing in the brain.
AB - The Wav2Vec and its variants have achieved unprecedented success in computational auditory and speech processing. Meanwhile, neural encoding studies that link representations of Wav2Vec to brain activities have provided novel insights into how auditory and speech processing unfold in the human brain. Most existing neural encoding studies treat each transformer encoding layer in Wav2Vec as a single artificial neuron (AN). That is, the layer-level embeddings are used to predict neural responses. The layer-level embedding aggregates multiple types of contextual attention captured by multi-head self-attention (MSA). Thus, the layer-level ANs lack fine-granularity for neural encoding. To address this limitation, we define the elementary units, i.e., each hidden dimension, as neuron-level ANs in Wav2Vec2.0, quantify their temporal responses, and couple those ANs with their biological-neuron (BN) counterparts in the human brain. Our experimental results demonstrated that: 1) The proposed neuron-level ANs carry meaningful neurolinguistic information; 2) Those ANs anchor to their BN signatures; 3) The AN-BN anchoring patterns are interpretable from a neurolinguistic perspective. More importantly, our results suggest an intermediate stage in both the computational representation in Wav2Vec2.0 and the cortical representation in the brain. Our study validates the fine-grained ANs in Wav2Vec2.0, which may serve as a novel and general strategy to link transformer-based deep learning models to neural responses for probing sensory processing in the brain.
UR - http://www.scopus.com/inward/record.url?scp=85165953081&partnerID=8YFLogxK
U2 - 10.18653/v1/2023.findings-acl.503
DO - 10.18653/v1/2023.findings-acl.503
M3 - 会议稿件
AN - SCOPUS:85165953081
T3 - Proceedings of the Annual Meeting of the Association for Computational Linguistics
SP - 7943
EP - 7956
BT - Findings of the Association for Computational Linguistics, ACL 2023
PB - Association for Computational Linguistics (ACL)
T2 - Findings of the Association for Computational Linguistics, ACL 2023
Y2 - 9 July 2023 through 14 July 2023
ER -