Listen to the image

Di Hu; Dong Wang; Xuelong Li; Feiping Nie; Qi Wang

doi:10.1109/CVPR.2019.00816

Listen to the image

Di Hu, Dong Wang, Xuelong Li, Feiping Nie, Qi Wang

School of Artificial Intelligence, OPtics and Electronics

Northwestern Polytechnical University Xian

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

16 Scopus citations

Abstract

Visual-to-auditory sensory substitution devices can assist the blind in sensing the visual environment by translating the visual information into a sound pattern. To improve the translation quality, the task performances of the blind are usually employed to evaluate different encoding schemes. In contrast to the toilsome human-based assessment, we argue that machine model can be also developed for evaluation, and more efficient. To this end, we firstly propose two distinct cross-modal perception model w.r.t. the late-blind and congenitally-blind cases, which aim to generate concrete visual contents based on the translated sound. To validate the functionality of proposed models, two novel optimization strategies w.r.t. the primary encoding scheme are presented. Further, we conduct sets of human-based experiments to evaluate and compare them with the conducted machine-based assessments in the cross-modal generation task. Their highly consistent results w.r.t. different encoding schemes indicate that using machine model to accelerate optimization evaluation and reduce experimental cost is feasible to some extent, which could dramatically promote the upgrading of encoding scheme then help the blind to improve their visual perception ability.

Original language	English
Title of host publication	Proceedings - 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2019
Publisher	IEEE Computer Society
Pages	7964-7973
Number of pages	10
ISBN (Electronic)	9781728132938
DOIs	https://doi.org/10.1109/CVPR.2019.00816
State	Published - Jun 2019
Event	32nd IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2019 - Long Beach, United States Duration: 16 Jun 2019 → 20 Jun 2019

Publication series

Name	Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
Volume	2019-June
ISSN (Print)	1063-6919

Conference

Conference	32nd IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2019
Country/Territory	United States
City	Long Beach
Period	16/06/19 → 20/06/19

Keywords

Biological a
Computational Photography
Datasets and Evaluation
Image and Video Synthesis
Image and Video Synthesis
Medical

Access to Document

10.1109/CVPR.2019.00816

Cite this

Hu, D., Wang, D., Li, X., Nie, F., & Wang, Q. (2019). Listen to the image. In Proceedings - 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2019 (pp. 7964-7973). Article 8953471 (Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition; Vol. 2019-June). IEEE Computer Society. https://doi.org/10.1109/CVPR.2019.00816

@inproceedings{43f9f9023aad45aab908b307b688e96b,

title = "Listen to the image",

abstract = "Visual-to-auditory sensory substitution devices can assist the blind in sensing the visual environment by translating the visual information into a sound pattern. To improve the translation quality, the task performances of the blind are usually employed to evaluate different encoding schemes. In contrast to the toilsome human-based assessment, we argue that machine model can be also developed for evaluation, and more efficient. To this end, we firstly propose two distinct cross-modal perception model w.r.t. the late-blind and congenitally-blind cases, which aim to generate concrete visual contents based on the translated sound. To validate the functionality of proposed models, two novel optimization strategies w.r.t. the primary encoding scheme are presented. Further, we conduct sets of human-based experiments to evaluate and compare them with the conducted machine-based assessments in the cross-modal generation task. Their highly consistent results w.r.t. different encoding schemes indicate that using machine model to accelerate optimization evaluation and reduce experimental cost is feasible to some extent, which could dramatically promote the upgrading of encoding scheme then help the blind to improve their visual perception ability.",

keywords = "Biological a, Computational Photography, Datasets and Evaluation, Image and Video Synthesis, Image and Video Synthesis, Medical",

author = "Di Hu and Dong Wang and Xuelong Li and Feiping Nie and Qi Wang",

note = "Publisher Copyright: {\textcopyright} 2019 IEEE.; 32nd IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2019 ; Conference date: 16-06-2019 Through 20-06-2019",

year = "2019",

month = jun,

doi = "10.1109/CVPR.2019.00816",

language = "英语",

series = "Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition",

publisher = "IEEE Computer Society",

pages = "7964--7973",

booktitle = "Proceedings - 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2019",

}

Hu, D, Wang, D, Li, X, Nie, F & Wang, Q 2019, Listen to the image. in Proceedings - 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2019., 8953471, Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2019-June, IEEE Computer Society, pp. 7964-7973, 32nd IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, United States, 16/06/19. https://doi.org/10.1109/CVPR.2019.00816

Listen to the image. / Hu, Di; Wang, Dong; Li, Xuelong et al.
Proceedings - 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2019. IEEE Computer Society, 2019. p. 7964-7973 8953471 (Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition; Vol. 2019-June).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - Listen to the image

AU - Hu, Di

AU - Wang, Dong

AU - Li, Xuelong

AU - Nie, Feiping

AU - Wang, Qi

PY - 2019/6

Y1 - 2019/6

N2 - Visual-to-auditory sensory substitution devices can assist the blind in sensing the visual environment by translating the visual information into a sound pattern. To improve the translation quality, the task performances of the blind are usually employed to evaluate different encoding schemes. In contrast to the toilsome human-based assessment, we argue that machine model can be also developed for evaluation, and more efficient. To this end, we firstly propose two distinct cross-modal perception model w.r.t. the late-blind and congenitally-blind cases, which aim to generate concrete visual contents based on the translated sound. To validate the functionality of proposed models, two novel optimization strategies w.r.t. the primary encoding scheme are presented. Further, we conduct sets of human-based experiments to evaluate and compare them with the conducted machine-based assessments in the cross-modal generation task. Their highly consistent results w.r.t. different encoding schemes indicate that using machine model to accelerate optimization evaluation and reduce experimental cost is feasible to some extent, which could dramatically promote the upgrading of encoding scheme then help the blind to improve their visual perception ability.

AB - Visual-to-auditory sensory substitution devices can assist the blind in sensing the visual environment by translating the visual information into a sound pattern. To improve the translation quality, the task performances of the blind are usually employed to evaluate different encoding schemes. In contrast to the toilsome human-based assessment, we argue that machine model can be also developed for evaluation, and more efficient. To this end, we firstly propose two distinct cross-modal perception model w.r.t. the late-blind and congenitally-blind cases, which aim to generate concrete visual contents based on the translated sound. To validate the functionality of proposed models, two novel optimization strategies w.r.t. the primary encoding scheme are presented. Further, we conduct sets of human-based experiments to evaluate and compare them with the conducted machine-based assessments in the cross-modal generation task. Their highly consistent results w.r.t. different encoding schemes indicate that using machine model to accelerate optimization evaluation and reduce experimental cost is feasible to some extent, which could dramatically promote the upgrading of encoding scheme then help the blind to improve their visual perception ability.

KW - Biological a

KW - Computational Photography

KW - Datasets and Evaluation

KW - Image and Video Synthesis

KW - Medical

UR - http://www.scopus.com/inward/record.url?scp=85078750085&partnerID=8YFLogxK

U2 - 10.1109/CVPR.2019.00816

DO - 10.1109/CVPR.2019.00816

M3 - 会议稿件

AN - SCOPUS:85078750085

T3 - Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition

SP - 7964

EP - 7973

BT - Proceedings - 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2019

PB - IEEE Computer Society

T2 - 32nd IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2019

Y2 - 16 June 2019 through 20 June 2019

ER -

Listen to the image

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this