Bilinear Semi-Tensor Product Attention (BSTPA) model for visual question answering

Zongwen Bai; Ying Li; Meili Zhou; Di Li; Dong Wang; Dawid Polap; Marcin Wozniak

doi:10.1109/IJCNN48605.2020.9206964

Bilinear Semi-Tensor Product Attention (BSTPA) model for visual question answering

Zongwen Bai, Ying Li, Meili Zhou, Di Li, Dong Wang, Dawid Polap, Marcin Wozniak

School of Computer Science

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

10 Scopus citations

Abstract

We propose a semi-tensor product attention network model as a visual question answering tool for complex interaction over image features. Proposed model performs matrix multiplication of two arbitrary dimensions, which is used to overcome possible dimensional limitations and improve recognition flexibility. In used block-wise operation we preserve spatial and temporal information but reduce the number of parameters by using low-rank pooling scheme. Applied BERT pre-train model is tuned to recognize question features. The proposed model is evaluated on the VQA2.0 dataset. Research results show that our model has good accuracy and easy reconfiguration for future research.

Original language	English
Title of host publication	2020 International Joint Conference on Neural Networks, IJCNN 2020 - Proceedings
Publisher	Institute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)	9781728169262
DOIs	https://doi.org/10.1109/IJCNN48605.2020.9206964
State	Published - Jul 2020
Event	2020 International Joint Conference on Neural Networks, IJCNN 2020 - Virtual, Glasgow, United Kingdom Duration: 19 Jul 2020 → 24 Jul 2020

Publication series

Name	Proceedings of the International Joint Conference on Neural Networks

Conference

Conference	2020 International Joint Conference on Neural Networks, IJCNN 2020
Country/Territory	United Kingdom
City	Virtual, Glasgow
Period	19/07/20 → 24/07/20

Keywords

bidirectional encoder representation from transformers
multimodal feature fusion
semi-tensor product attention
visual question answer

Access to Document

10.1109/IJCNN48605.2020.9206964

Cite this

Bai, Z., Li, Y., Zhou, M., Li, D., Wang, D., Polap, D., & Wozniak, M. (2020). Bilinear Semi-Tensor Product Attention (BSTPA) model for visual question answering. In 2020 International Joint Conference on Neural Networks, IJCNN 2020 - Proceedings Article 9206964 (Proceedings of the International Joint Conference on Neural Networks). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/IJCNN48605.2020.9206964

@inproceedings{42023d0d72254976afb307b15b955bfc,

title = "Bilinear Semi-Tensor Product Attention (BSTPA) model for visual question answering",

abstract = "We propose a semi-tensor product attention network model as a visual question answering tool for complex interaction over image features. Proposed model performs matrix multiplication of two arbitrary dimensions, which is used to overcome possible dimensional limitations and improve recognition flexibility. In used block-wise operation we preserve spatial and temporal information but reduce the number of parameters by using low-rank pooling scheme. Applied BERT pre-train model is tuned to recognize question features. The proposed model is evaluated on the VQA2.0 dataset. Research results show that our model has good accuracy and easy reconfiguration for future research.",

keywords = "bidirectional encoder representation from transformers, multimodal feature fusion, semi-tensor product attention, visual question answer",

author = "Zongwen Bai and Ying Li and Meili Zhou and Di Li and Dong Wang and Dawid Polap and Marcin Wozniak",

note = "Publisher Copyright: {\textcopyright} 2020 IEEE.; 2020 International Joint Conference on Neural Networks, IJCNN 2020 ; Conference date: 19-07-2020 Through 24-07-2020",

year = "2020",

month = jul,

doi = "10.1109/IJCNN48605.2020.9206964",

language = "英语",

series = "Proceedings of the International Joint Conference on Neural Networks",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

booktitle = "2020 International Joint Conference on Neural Networks, IJCNN 2020 - Proceedings",

}

Bai, Z, Li, Y, Zhou, M, Li, D, Wang, D, Polap, D & Wozniak, M 2020, Bilinear Semi-Tensor Product Attention (BSTPA) model for visual question answering. in 2020 International Joint Conference on Neural Networks, IJCNN 2020 - Proceedings., 9206964, Proceedings of the International Joint Conference on Neural Networks, Institute of Electrical and Electronics Engineers Inc., 2020 International Joint Conference on Neural Networks, IJCNN 2020, Virtual, Glasgow, United Kingdom, 19/07/20. https://doi.org/10.1109/IJCNN48605.2020.9206964

Bilinear Semi-Tensor Product Attention (BSTPA) model for visual question answering. / Bai, Zongwen; Li, Ying; Zhou, Meili et al.
2020 International Joint Conference on Neural Networks, IJCNN 2020 - Proceedings. Institute of Electrical and Electronics Engineers Inc., 2020. 9206964 (Proceedings of the International Joint Conference on Neural Networks).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - Bilinear Semi-Tensor Product Attention (BSTPA) model for visual question answering

AU - Bai, Zongwen

AU - Li, Ying

AU - Zhou, Meili

AU - Li, Di

AU - Wang, Dong

AU - Polap, Dawid

AU - Wozniak, Marcin

PY - 2020/7

Y1 - 2020/7

N2 - We propose a semi-tensor product attention network model as a visual question answering tool for complex interaction over image features. Proposed model performs matrix multiplication of two arbitrary dimensions, which is used to overcome possible dimensional limitations and improve recognition flexibility. In used block-wise operation we preserve spatial and temporal information but reduce the number of parameters by using low-rank pooling scheme. Applied BERT pre-train model is tuned to recognize question features. The proposed model is evaluated on the VQA2.0 dataset. Research results show that our model has good accuracy and easy reconfiguration for future research.

AB - We propose a semi-tensor product attention network model as a visual question answering tool for complex interaction over image features. Proposed model performs matrix multiplication of two arbitrary dimensions, which is used to overcome possible dimensional limitations and improve recognition flexibility. In used block-wise operation we preserve spatial and temporal information but reduce the number of parameters by using low-rank pooling scheme. Applied BERT pre-train model is tuned to recognize question features. The proposed model is evaluated on the VQA2.0 dataset. Research results show that our model has good accuracy and easy reconfiguration for future research.

KW - bidirectional encoder representation from transformers

KW - multimodal feature fusion

KW - semi-tensor product attention

KW - visual question answer

UR - http://www.scopus.com/inward/record.url?scp=85093875306&partnerID=8YFLogxK

U2 - 10.1109/IJCNN48605.2020.9206964

DO - 10.1109/IJCNN48605.2020.9206964

M3 - 会议稿件

AN - SCOPUS:85093875306

T3 - Proceedings of the International Joint Conference on Neural Networks

BT - 2020 International Joint Conference on Neural Networks, IJCNN 2020 - Proceedings

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 2020 International Joint Conference on Neural Networks, IJCNN 2020

Y2 - 19 July 2020 through 24 July 2020

ER -

Bai Z, Li Y, Zhou M, Li D, Wang D, Polap D et al. Bilinear Semi-Tensor Product Attention (BSTPA) model for visual question answering. In 2020 International Joint Conference on Neural Networks, IJCNN 2020 - Proceedings. Institute of Electrical and Electronics Engineers Inc. 2020. 9206964. (Proceedings of the International Joint Conference on Neural Networks). doi: 10.1109/IJCNN48605.2020.9206964

Bilinear Semi-Tensor Product Attention (BSTPA) model for visual question answering

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this