DecomVQANet: Decomposing visual question answering deep network via tensor decomposition and regression

Zongwen Bai; Ying Li; Marcin Woźniak; Meili Zhou; Di Li

doi:10.1016/j.patcog.2020.107538

DecomVQANet: Decomposing visual question answering deep network via tensor decomposition and regression

Zongwen Bai, Ying Li, Marcin Woźniak, Meili Zhou, Di Li

School of Computer Science

Research output: Contribution to journal › Article › peer-review

71 Scopus citations

Abstract

The model we developed is a novel comprehensive solution to compress and accelerate the Visual Question Answering systems. In our algorithm Convolutional Neural Network is compressed with Long Short Term Memory to accelerate processing simultaneously. We propose to conduct various decomposition methods and regression strategies on different layers, including Canonical Polyadic, Tucker, and Tensor Train to decompose Fully Connected layers in CNN and LSTM. The Flattening Layer and Fully Connected layer at the end of the model are replaced with Tensor Regression layers. In order to compress the parameter further, the feature flow between the layers is compressed by Tensor Contraction layer. The proposed tensor decomposition model was evaluated on VQA 2.0 dataset with Pythia as baseline model. Our proposed model achieved from 77% to 91% of compression ratio, and only from 1% to 5% accuracy drop.

Original language	English
Article number	107538
Journal	Pattern Recognition
Volume	110
DOIs	https://doi.org/10.1016/j.patcog.2020.107538
State	Published - Feb 2021

Keywords

Tensor contraction layer
Tensor decomposition
Tensor regression layer
Visual question answering

Access to Document

10.1016/j.patcog.2020.107538

Cite this

@article{94143bd2efe14d9e8db88adc91e51724,

title = "DecomVQANet: Decomposing visual question answering deep network via tensor decomposition and regression",

abstract = "The model we developed is a novel comprehensive solution to compress and accelerate the Visual Question Answering systems. In our algorithm Convolutional Neural Network is compressed with Long Short Term Memory to accelerate processing simultaneously. We propose to conduct various decomposition methods and regression strategies on different layers, including Canonical Polyadic, Tucker, and Tensor Train to decompose Fully Connected layers in CNN and LSTM. The Flattening Layer and Fully Connected layer at the end of the model are replaced with Tensor Regression layers. In order to compress the parameter further, the feature flow between the layers is compressed by Tensor Contraction layer. The proposed tensor decomposition model was evaluated on VQA 2.0 dataset with Pythia as baseline model. Our proposed model achieved from 77% to 91% of compression ratio, and only from 1% to 5% accuracy drop.",

keywords = "Tensor contraction layer, Tensor decomposition, Tensor regression layer, Visual question answering",

author = "Zongwen Bai and Ying Li and Marcin Wo{\'z}niak and Meili Zhou and Di Li",

note = "Publisher Copyright: {\textcopyright} 2020 Elsevier Ltd",

year = "2021",

month = feb,

doi = "10.1016/j.patcog.2020.107538",

language = "英语",

volume = "110",

journal = "Pattern Recognition",

issn = "0031-3203",

publisher = "Elsevier Ltd",

}

TY - JOUR

T1 - DecomVQANet

T2 - Decomposing visual question answering deep network via tensor decomposition and regression

AU - Bai, Zongwen

AU - Li, Ying

AU - Woźniak, Marcin

AU - Zhou, Meili

AU - Li, Di

PY - 2021/2

Y1 - 2021/2

N2 - The model we developed is a novel comprehensive solution to compress and accelerate the Visual Question Answering systems. In our algorithm Convolutional Neural Network is compressed with Long Short Term Memory to accelerate processing simultaneously. We propose to conduct various decomposition methods and regression strategies on different layers, including Canonical Polyadic, Tucker, and Tensor Train to decompose Fully Connected layers in CNN and LSTM. The Flattening Layer and Fully Connected layer at the end of the model are replaced with Tensor Regression layers. In order to compress the parameter further, the feature flow between the layers is compressed by Tensor Contraction layer. The proposed tensor decomposition model was evaluated on VQA 2.0 dataset with Pythia as baseline model. Our proposed model achieved from 77% to 91% of compression ratio, and only from 1% to 5% accuracy drop.

AB - The model we developed is a novel comprehensive solution to compress and accelerate the Visual Question Answering systems. In our algorithm Convolutional Neural Network is compressed with Long Short Term Memory to accelerate processing simultaneously. We propose to conduct various decomposition methods and regression strategies on different layers, including Canonical Polyadic, Tucker, and Tensor Train to decompose Fully Connected layers in CNN and LSTM. The Flattening Layer and Fully Connected layer at the end of the model are replaced with Tensor Regression layers. In order to compress the parameter further, the feature flow between the layers is compressed by Tensor Contraction layer. The proposed tensor decomposition model was evaluated on VQA 2.0 dataset with Pythia as baseline model. Our proposed model achieved from 77% to 91% of compression ratio, and only from 1% to 5% accuracy drop.

KW - Tensor contraction layer

KW - Tensor decomposition

KW - Tensor regression layer

KW - Visual question answering

UR - http://www.scopus.com/inward/record.url?scp=85088141265&partnerID=8YFLogxK

U2 - 10.1016/j.patcog.2020.107538

DO - 10.1016/j.patcog.2020.107538

M3 - 文章

AN - SCOPUS:85088141265

SN - 0031-3203

VL - 110

JO - Pattern Recognition

JF - Pattern Recognition

M1 - 107538

ER -

DecomVQANet: Decomposing visual question answering deep network via tensor decomposition and regression

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this