Gestalt Principles Emerge When Learning Universal Sound Source Separation

Han Li; Kean Chen; Bernhard U. Seeber

doi:10.1109/TASLP.2022.3178233

Gestalt Principles Emerge When Learning Universal Sound Source Separation

Han Li, Kean Chen, Bernhard U. Seeber

航海学院

科研成果: 期刊稿件 › 文章 › 同行评审

1 引用（Scopus）

摘要

Sound source separation is an essential aspect in auditory scene analysis, which is still an urgent challenge for machine hearing. In this paper, a fully convolutional time-domain audio separation network (ConvTasNet) is trained for universal two-source separation, consisting of speech, environmental sounds, and music. Besides the separation performance of the network, the underlying separation mechanisms are our main concern. Through a series of classic auditory segregation experiments, we systematically explore the principles learned by the network for simultaneous and sequential organization. The results show that without prior knowledge of auditory scene analysis imparted on the network, it spontaneously learns the separation mechanisms from raw waveforms that are similar to those which have developed over many years in humans. The Gestalt principles for separation in the human auditory system are shown to be effective in our network: harmonicity, onset synchrony and common fate (coherent modulation in amplitude and frequency), proximity, continuity, similarity. The universal sound source separation network following Gestalt principles is not limited to specific sources and can be applied to various acoustic situations like human hearing, providing new directions for solving the problem of auditory scene analysis.

源语言	英语
页（从-至）	1877-1891
页数	15
期刊	IEEE/ACM Transactions on Audio Speech and Language Processing
卷	30
DOI	https://doi.org/10.1109/TASLP.2022.3178233
出版状态	已出版 - 2022

访问文件

10.1109/TASLP.2022.3178233

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{430bf3d82c5c4eb4baf903937a7df369,

title = "Gestalt Principles Emerge When Learning Universal Sound Source Separation",

abstract = "Sound source separation is an essential aspect in auditory scene analysis, which is still an urgent challenge for machine hearing. In this paper, a fully convolutional time-domain audio separation network (ConvTasNet) is trained for universal two-source separation, consisting of speech, environmental sounds, and music. Besides the separation performance of the network, the underlying separation mechanisms are our main concern. Through a series of classic auditory segregation experiments, we systematically explore the principles learned by the network for simultaneous and sequential organization. The results show that without prior knowledge of auditory scene analysis imparted on the network, it spontaneously learns the separation mechanisms from raw waveforms that are similar to those which have developed over many years in humans. The Gestalt principles for separation in the human auditory system are shown to be effective in our network: harmonicity, onset synchrony and common fate (coherent modulation in amplitude and frequency), proximity, continuity, similarity. The universal sound source separation network following Gestalt principles is not limited to specific sources and can be applied to various acoustic situations like human hearing, providing new directions for solving the problem of auditory scene analysis.",

keywords = "Gestalt principles, separation mechanisms, universal source separation",

author = "Han Li and Kean Chen and Seeber, {Bernhard U.}",

note = "Publisher Copyright: {\textcopyright} 2014 IEEE.",

year = "2022",

doi = "10.1109/TASLP.2022.3178233",

language = "英语",

volume = "30",

pages = "1877--1891",

journal = "IEEE/ACM Transactions on Audio Speech and Language Processing",

issn = "2329-9290",

publisher = "IEEE Advancing Technology for Humanity",

}

TY - JOUR

T1 - Gestalt Principles Emerge When Learning Universal Sound Source Separation

AU - Li, Han

AU - Chen, Kean

AU - Seeber, Bernhard U.

PY - 2022

Y1 - 2022

N2 - Sound source separation is an essential aspect in auditory scene analysis, which is still an urgent challenge for machine hearing. In this paper, a fully convolutional time-domain audio separation network (ConvTasNet) is trained for universal two-source separation, consisting of speech, environmental sounds, and music. Besides the separation performance of the network, the underlying separation mechanisms are our main concern. Through a series of classic auditory segregation experiments, we systematically explore the principles learned by the network for simultaneous and sequential organization. The results show that without prior knowledge of auditory scene analysis imparted on the network, it spontaneously learns the separation mechanisms from raw waveforms that are similar to those which have developed over many years in humans. The Gestalt principles for separation in the human auditory system are shown to be effective in our network: harmonicity, onset synchrony and common fate (coherent modulation in amplitude and frequency), proximity, continuity, similarity. The universal sound source separation network following Gestalt principles is not limited to specific sources and can be applied to various acoustic situations like human hearing, providing new directions for solving the problem of auditory scene analysis.

AB - Sound source separation is an essential aspect in auditory scene analysis, which is still an urgent challenge for machine hearing. In this paper, a fully convolutional time-domain audio separation network (ConvTasNet) is trained for universal two-source separation, consisting of speech, environmental sounds, and music. Besides the separation performance of the network, the underlying separation mechanisms are our main concern. Through a series of classic auditory segregation experiments, we systematically explore the principles learned by the network for simultaneous and sequential organization. The results show that without prior knowledge of auditory scene analysis imparted on the network, it spontaneously learns the separation mechanisms from raw waveforms that are similar to those which have developed over many years in humans. The Gestalt principles for separation in the human auditory system are shown to be effective in our network: harmonicity, onset synchrony and common fate (coherent modulation in amplitude and frequency), proximity, continuity, similarity. The universal sound source separation network following Gestalt principles is not limited to specific sources and can be applied to various acoustic situations like human hearing, providing new directions for solving the problem of auditory scene analysis.

KW - Gestalt principles

KW - separation mechanisms

KW - universal source separation

UR - http://www.scopus.com/inward/record.url?scp=85132074400&partnerID=8YFLogxK

U2 - 10.1109/TASLP.2022.3178233

DO - 10.1109/TASLP.2022.3178233

M3 - 文章

AN - SCOPUS:85132074400

SN - 2329-9290

VL - 30

SP - 1877

EP - 1891

JO - IEEE/ACM Transactions on Audio Speech and Language Processing

JF - IEEE/ACM Transactions on Audio Speech and Language Processing

ER -

Gestalt Principles Emerge When Learning Universal Sound Source Separation

摘要

访问文件

其它文件与链接

指纹

引用此