Listen to the image

Di Hu, Dong Wang, Xuelong Li, Feiping Nie, Qi Wang

科研成果: 书/报告/会议事项章节会议稿件同行评审

16 引用 (Scopus)

摘要

Visual-to-auditory sensory substitution devices can assist the blind in sensing the visual environment by translating the visual information into a sound pattern. To improve the translation quality, the task performances of the blind are usually employed to evaluate different encoding schemes. In contrast to the toilsome human-based assessment, we argue that machine model can be also developed for evaluation, and more efficient. To this end, we firstly propose two distinct cross-modal perception model w.r.t. the late-blind and congenitally-blind cases, which aim to generate concrete visual contents based on the translated sound. To validate the functionality of proposed models, two novel optimization strategies w.r.t. the primary encoding scheme are presented. Further, we conduct sets of human-based experiments to evaluate and compare them with the conducted machine-based assessments in the cross-modal generation task. Their highly consistent results w.r.t. different encoding schemes indicate that using machine model to accelerate optimization evaluation and reduce experimental cost is feasible to some extent, which could dramatically promote the upgrading of encoding scheme then help the blind to improve their visual perception ability.

源语言英语
主期刊名Proceedings - 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2019
出版商IEEE Computer Society
7964-7973
页数10
ISBN(电子版)9781728132938
DOI
出版状态已出版 - 6月 2019
活动32nd IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2019 - Long Beach, 美国
期限: 16 6月 201920 6月 2019

出版系列

姓名Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
2019-June
ISSN(印刷版)1063-6919

会议

会议32nd IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2019
国家/地区美国
Long Beach
时期16/06/1920/06/19

指纹

探究 'Listen to the image' 的科研主题。它们共同构成独一无二的指纹。

引用此