Bilinear Semi-Tensor Product Attention (BSTPA) model for visual question answering

Zongwen Bai, Ying Li, Meili Zhou, Di Li, Dong Wang, Dawid Polap, Marcin Wozniak

科研成果: 书/报告/会议事项章节会议稿件同行评审

10 引用 (Scopus)

摘要

We propose a semi-tensor product attention network model as a visual question answering tool for complex interaction over image features. Proposed model performs matrix multiplication of two arbitrary dimensions, which is used to overcome possible dimensional limitations and improve recognition flexibility. In used block-wise operation we preserve spatial and temporal information but reduce the number of parameters by using low-rank pooling scheme. Applied BERT pre-train model is tuned to recognize question features. The proposed model is evaluated on the VQA2.0 dataset. Research results show that our model has good accuracy and easy reconfiguration for future research.

源语言英语
主期刊名2020 International Joint Conference on Neural Networks, IJCNN 2020 - Proceedings
出版商Institute of Electrical and Electronics Engineers Inc.
ISBN(电子版)9781728169262
DOI
出版状态已出版 - 7月 2020
活动2020 International Joint Conference on Neural Networks, IJCNN 2020 - Virtual, Glasgow, 英国
期限: 19 7月 202024 7月 2020

出版系列

姓名Proceedings of the International Joint Conference on Neural Networks

会议

会议2020 International Joint Conference on Neural Networks, IJCNN 2020
国家/地区英国
Virtual, Glasgow
时期19/07/2024/07/20

指纹

探究 'Bilinear Semi-Tensor Product Attention (BSTPA) model for visual question answering' 的科研主题。它们共同构成独一无二的指纹。

引用此