Bilinear Semi-Tensor Product Attention (BSTPA) model for visual question answering

Zongwen Bai, Ying Li, Meili Zhou, Di Li, Dong Wang, Dawid Polap, Marcin Wozniak

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

10 Scopus citations

Abstract

We propose a semi-tensor product attention network model as a visual question answering tool for complex interaction over image features. Proposed model performs matrix multiplication of two arbitrary dimensions, which is used to overcome possible dimensional limitations and improve recognition flexibility. In used block-wise operation we preserve spatial and temporal information but reduce the number of parameters by using low-rank pooling scheme. Applied BERT pre-train model is tuned to recognize question features. The proposed model is evaluated on the VQA2.0 dataset. Research results show that our model has good accuracy and easy reconfiguration for future research.

Original languageEnglish
Title of host publication2020 International Joint Conference on Neural Networks, IJCNN 2020 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781728169262
DOIs
StatePublished - Jul 2020
Event2020 International Joint Conference on Neural Networks, IJCNN 2020 - Virtual, Glasgow, United Kingdom
Duration: 19 Jul 202024 Jul 2020

Publication series

NameProceedings of the International Joint Conference on Neural Networks

Conference

Conference2020 International Joint Conference on Neural Networks, IJCNN 2020
Country/TerritoryUnited Kingdom
CityVirtual, Glasgow
Period19/07/2024/07/20

Keywords

  • bidirectional encoder representation from transformers
  • multimodal feature fusion
  • semi-tensor product attention
  • visual question answer

Fingerprint

Dive into the research topics of 'Bilinear Semi-Tensor Product Attention (BSTPA) model for visual question answering'. Together they form a unique fingerprint.

Cite this