TY - GEN
T1 - Listen to the image
AU - Hu, Di
AU - Wang, Dong
AU - Li, Xuelong
AU - Nie, Feiping
AU - Wang, Qi
N1 - Publisher Copyright:
© 2019 IEEE.
PY - 2019/6
Y1 - 2019/6
N2 - Visual-to-auditory sensory substitution devices can assist the blind in sensing the visual environment by translating the visual information into a sound pattern. To improve the translation quality, the task performances of the blind are usually employed to evaluate different encoding schemes. In contrast to the toilsome human-based assessment, we argue that machine model can be also developed for evaluation, and more efficient. To this end, we firstly propose two distinct cross-modal perception model w.r.t. the late-blind and congenitally-blind cases, which aim to generate concrete visual contents based on the translated sound. To validate the functionality of proposed models, two novel optimization strategies w.r.t. the primary encoding scheme are presented. Further, we conduct sets of human-based experiments to evaluate and compare them with the conducted machine-based assessments in the cross-modal generation task. Their highly consistent results w.r.t. different encoding schemes indicate that using machine model to accelerate optimization evaluation and reduce experimental cost is feasible to some extent, which could dramatically promote the upgrading of encoding scheme then help the blind to improve their visual perception ability.
AB - Visual-to-auditory sensory substitution devices can assist the blind in sensing the visual environment by translating the visual information into a sound pattern. To improve the translation quality, the task performances of the blind are usually employed to evaluate different encoding schemes. In contrast to the toilsome human-based assessment, we argue that machine model can be also developed for evaluation, and more efficient. To this end, we firstly propose two distinct cross-modal perception model w.r.t. the late-blind and congenitally-blind cases, which aim to generate concrete visual contents based on the translated sound. To validate the functionality of proposed models, two novel optimization strategies w.r.t. the primary encoding scheme are presented. Further, we conduct sets of human-based experiments to evaluate and compare them with the conducted machine-based assessments in the cross-modal generation task. Their highly consistent results w.r.t. different encoding schemes indicate that using machine model to accelerate optimization evaluation and reduce experimental cost is feasible to some extent, which could dramatically promote the upgrading of encoding scheme then help the blind to improve their visual perception ability.
KW - Biological a
KW - Computational Photography
KW - Datasets and Evaluation
KW - Image and Video Synthesis
KW - Image and Video Synthesis
KW - Medical
UR - http://www.scopus.com/inward/record.url?scp=85078750085&partnerID=8YFLogxK
U2 - 10.1109/CVPR.2019.00816
DO - 10.1109/CVPR.2019.00816
M3 - 会议稿件
AN - SCOPUS:85078750085
T3 - Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
SP - 7964
EP - 7973
BT - Proceedings - 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2019
PB - IEEE Computer Society
T2 - 32nd IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2019
Y2 - 16 June 2019 through 20 June 2019
ER -