DecomVQANet: Decomposing visual question answering deep network via tensor decomposition and regression

Zongwen Bai, Ying Li, Marcin Woźniak, Meili Zhou, Di Li

Research output: Contribution to journalArticlepeer-review

71 Scopus citations

Abstract

The model we developed is a novel comprehensive solution to compress and accelerate the Visual Question Answering systems. In our algorithm Convolutional Neural Network is compressed with Long Short Term Memory to accelerate processing simultaneously. We propose to conduct various decomposition methods and regression strategies on different layers, including Canonical Polyadic, Tucker, and Tensor Train to decompose Fully Connected layers in CNN and LSTM. The Flattening Layer and Fully Connected layer at the end of the model are replaced with Tensor Regression layers. In order to compress the parameter further, the feature flow between the layers is compressed by Tensor Contraction layer. The proposed tensor decomposition model was evaluated on VQA 2.0 dataset with Pythia as baseline model. Our proposed model achieved from 77% to 91% of compression ratio, and only from 1% to 5% accuracy drop.

Original languageEnglish
Article number107538
JournalPattern Recognition
Volume110
DOIs
StatePublished - Feb 2021

Keywords

  • Tensor contraction layer
  • Tensor decomposition
  • Tensor regression layer
  • Visual question answering

Fingerprint

Dive into the research topics of 'DecomVQANet: Decomposing visual question answering deep network via tensor decomposition and regression'. Together they form a unique fingerprint.

Cite this