Listen to the image

Di Hu; Dong Wang; Xuelong Li; Feiping Nie; Qi Wang

doi:10.1109/CVPR.2019.00816

Listen to the image

Di Hu, Dong Wang, Xuelong Li, Feiping Nie, Qi Wang

光电与智能研究院

Northwestern Polytechnical University Xian

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

16 引用（Scopus）

摘要

Visual-to-auditory sensory substitution devices can assist the blind in sensing the visual environment by translating the visual information into a sound pattern. To improve the translation quality, the task performances of the blind are usually employed to evaluate different encoding schemes. In contrast to the toilsome human-based assessment, we argue that machine model can be also developed for evaluation, and more efficient. To this end, we firstly propose two distinct cross-modal perception model w.r.t. the late-blind and congenitally-blind cases, which aim to generate concrete visual contents based on the translated sound. To validate the functionality of proposed models, two novel optimization strategies w.r.t. the primary encoding scheme are presented. Further, we conduct sets of human-based experiments to evaluate and compare them with the conducted machine-based assessments in the cross-modal generation task. Their highly consistent results w.r.t. different encoding schemes indicate that using machine model to accelerate optimization evaluation and reduce experimental cost is feasible to some extent, which could dramatically promote the upgrading of encoding scheme then help the blind to improve their visual perception ability.

源语言	英语
主期刊名	Proceedings - 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2019
出版商	IEEE Computer Society
页	7964-7973
页数	10
ISBN（电子版）	9781728132938
DOI	https://doi.org/10.1109/CVPR.2019.00816
出版状态	已出版 - 6月 2019
活动	32nd IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2019 - Long Beach, 美国期限: 16 6月 2019 → 20 6月 2019

出版系列

姓名	Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
卷	2019-June
ISSN（印刷版）	1063-6919

会议

会议	32nd IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2019
国家/地区	美国
市	Long Beach
时期	16/06/19 → 20/06/19

访问文件

10.1109/CVPR.2019.00816

其它文件与链接

链接到 Scopus 的出版物

引用此

Hu, D., Wang, D., Li, X., Nie, F., & Wang, Q. (2019). Listen to the image. 在 Proceedings - 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2019 (页码 7964-7973). 文章 8953471 (Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition; 卷 2019-June). IEEE Computer Society. https://doi.org/10.1109/CVPR.2019.00816

@inproceedings{43f9f9023aad45aab908b307b688e96b,

title = "Listen to the image",

abstract = "Visual-to-auditory sensory substitution devices can assist the blind in sensing the visual environment by translating the visual information into a sound pattern. To improve the translation quality, the task performances of the blind are usually employed to evaluate different encoding schemes. In contrast to the toilsome human-based assessment, we argue that machine model can be also developed for evaluation, and more efficient. To this end, we firstly propose two distinct cross-modal perception model w.r.t. the late-blind and congenitally-blind cases, which aim to generate concrete visual contents based on the translated sound. To validate the functionality of proposed models, two novel optimization strategies w.r.t. the primary encoding scheme are presented. Further, we conduct sets of human-based experiments to evaluate and compare them with the conducted machine-based assessments in the cross-modal generation task. Their highly consistent results w.r.t. different encoding schemes indicate that using machine model to accelerate optimization evaluation and reduce experimental cost is feasible to some extent, which could dramatically promote the upgrading of encoding scheme then help the blind to improve their visual perception ability.",

keywords = "Biological a, Computational Photography, Datasets and Evaluation, Image and Video Synthesis, Image and Video Synthesis, Medical",

author = "Di Hu and Dong Wang and Xuelong Li and Feiping Nie and Qi Wang",

note = "Publisher Copyright: {\textcopyright} 2019 IEEE.; 32nd IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2019 ; Conference date: 16-06-2019 Through 20-06-2019",

year = "2019",

month = jun,

doi = "10.1109/CVPR.2019.00816",

language = "英语",

series = "Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition",

publisher = "IEEE Computer Society",

pages = "7964--7973",

booktitle = "Proceedings - 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2019",

}

Hu, D, Wang, D, Li, X, Nie, F & Wang, Q 2019, Listen to the image. 在 Proceedings - 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2019., 8953471, Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 卷 2019-June, IEEE Computer Society, 页码 7964-7973, 32nd IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, 美国, 16/06/19. https://doi.org/10.1109/CVPR.2019.00816

Listen to the image. / Hu, Di; Wang, Dong; Li, Xuelong 等.
Proceedings - 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2019. IEEE Computer Society, 2019. 页码 7964-7973 8953471 (Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition; 卷 2019-June).

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

TY - GEN

T1 - Listen to the image

AU - Hu, Di

AU - Wang, Dong

AU - Li, Xuelong

AU - Nie, Feiping

AU - Wang, Qi

PY - 2019/6

Y1 - 2019/6

N2 - Visual-to-auditory sensory substitution devices can assist the blind in sensing the visual environment by translating the visual information into a sound pattern. To improve the translation quality, the task performances of the blind are usually employed to evaluate different encoding schemes. In contrast to the toilsome human-based assessment, we argue that machine model can be also developed for evaluation, and more efficient. To this end, we firstly propose two distinct cross-modal perception model w.r.t. the late-blind and congenitally-blind cases, which aim to generate concrete visual contents based on the translated sound. To validate the functionality of proposed models, two novel optimization strategies w.r.t. the primary encoding scheme are presented. Further, we conduct sets of human-based experiments to evaluate and compare them with the conducted machine-based assessments in the cross-modal generation task. Their highly consistent results w.r.t. different encoding schemes indicate that using machine model to accelerate optimization evaluation and reduce experimental cost is feasible to some extent, which could dramatically promote the upgrading of encoding scheme then help the blind to improve their visual perception ability.

AB - Visual-to-auditory sensory substitution devices can assist the blind in sensing the visual environment by translating the visual information into a sound pattern. To improve the translation quality, the task performances of the blind are usually employed to evaluate different encoding schemes. In contrast to the toilsome human-based assessment, we argue that machine model can be also developed for evaluation, and more efficient. To this end, we firstly propose two distinct cross-modal perception model w.r.t. the late-blind and congenitally-blind cases, which aim to generate concrete visual contents based on the translated sound. To validate the functionality of proposed models, two novel optimization strategies w.r.t. the primary encoding scheme are presented. Further, we conduct sets of human-based experiments to evaluate and compare them with the conducted machine-based assessments in the cross-modal generation task. Their highly consistent results w.r.t. different encoding schemes indicate that using machine model to accelerate optimization evaluation and reduce experimental cost is feasible to some extent, which could dramatically promote the upgrading of encoding scheme then help the blind to improve their visual perception ability.

KW - Biological a

KW - Computational Photography

KW - Datasets and Evaluation

KW - Image and Video Synthesis

KW - Medical

UR - http://www.scopus.com/inward/record.url?scp=85078750085&partnerID=8YFLogxK

U2 - 10.1109/CVPR.2019.00816

DO - 10.1109/CVPR.2019.00816

M3 - 会议稿件

AN - SCOPUS:85078750085

T3 - Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition

SP - 7964

EP - 7973

BT - Proceedings - 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2019

PB - IEEE Computer Society

T2 - 32nd IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2019

Y2 - 16 June 2019 through 20 June 2019

ER -

Listen to the image

摘要

出版系列

会议

访问文件

其它文件与链接

指纹

引用此