TY - JOUR
T1 - Gestalt Principles Emerge When Learning Universal Sound Source Separation
AU - Li, Han
AU - Chen, Kean
AU - Seeber, Bernhard U.
N1 - Publisher Copyright:
© 2014 IEEE.
PY - 2022
Y1 - 2022
N2 - Sound source separation is an essential aspect in auditory scene analysis, which is still an urgent challenge for machine hearing. In this paper, a fully convolutional time-domain audio separation network (ConvTasNet) is trained for universal two-source separation, consisting of speech, environmental sounds, and music. Besides the separation performance of the network, the underlying separation mechanisms are our main concern. Through a series of classic auditory segregation experiments, we systematically explore the principles learned by the network for simultaneous and sequential organization. The results show that without prior knowledge of auditory scene analysis imparted on the network, it spontaneously learns the separation mechanisms from raw waveforms that are similar to those which have developed over many years in humans. The Gestalt principles for separation in the human auditory system are shown to be effective in our network: harmonicity, onset synchrony and common fate (coherent modulation in amplitude and frequency), proximity, continuity, similarity. The universal sound source separation network following Gestalt principles is not limited to specific sources and can be applied to various acoustic situations like human hearing, providing new directions for solving the problem of auditory scene analysis.
AB - Sound source separation is an essential aspect in auditory scene analysis, which is still an urgent challenge for machine hearing. In this paper, a fully convolutional time-domain audio separation network (ConvTasNet) is trained for universal two-source separation, consisting of speech, environmental sounds, and music. Besides the separation performance of the network, the underlying separation mechanisms are our main concern. Through a series of classic auditory segregation experiments, we systematically explore the principles learned by the network for simultaneous and sequential organization. The results show that without prior knowledge of auditory scene analysis imparted on the network, it spontaneously learns the separation mechanisms from raw waveforms that are similar to those which have developed over many years in humans. The Gestalt principles for separation in the human auditory system are shown to be effective in our network: harmonicity, onset synchrony and common fate (coherent modulation in amplitude and frequency), proximity, continuity, similarity. The universal sound source separation network following Gestalt principles is not limited to specific sources and can be applied to various acoustic situations like human hearing, providing new directions for solving the problem of auditory scene analysis.
KW - Gestalt principles
KW - separation mechanisms
KW - universal source separation
UR - http://www.scopus.com/inward/record.url?scp=85132074400&partnerID=8YFLogxK
U2 - 10.1109/TASLP.2022.3178233
DO - 10.1109/TASLP.2022.3178233
M3 - 文章
AN - SCOPUS:85132074400
SN - 2329-9290
VL - 30
SP - 1877
EP - 1891
JO - IEEE/ACM Transactions on Audio Speech and Language Processing
JF - IEEE/ACM Transactions on Audio Speech and Language Processing
ER -