DecomVQANet: Decomposing visual question answering deep network via tensor decomposition and regression

Zongwen Bai; Ying Li; Marcin Woźniak; Meili Zhou; Di Li

doi:10.1016/j.patcog.2020.107538

DecomVQANet: Decomposing visual question answering deep network via tensor decomposition and regression

Zongwen Bai, Ying Li, Marcin Woźniak, Meili Zhou, Di Li

计算机学院

科研成果: 期刊稿件 › 文章 › 同行评审

71 引用（Scopus）

摘要

The model we developed is a novel comprehensive solution to compress and accelerate the Visual Question Answering systems. In our algorithm Convolutional Neural Network is compressed with Long Short Term Memory to accelerate processing simultaneously. We propose to conduct various decomposition methods and regression strategies on different layers, including Canonical Polyadic, Tucker, and Tensor Train to decompose Fully Connected layers in CNN and LSTM. The Flattening Layer and Fully Connected layer at the end of the model are replaced with Tensor Regression layers. In order to compress the parameter further, the feature flow between the layers is compressed by Tensor Contraction layer. The proposed tensor decomposition model was evaluated on VQA 2.0 dataset with Pythia as baseline model. Our proposed model achieved from 77% to 91% of compression ratio, and only from 1% to 5% accuracy drop.

源语言	英语
文章编号	107538
期刊	Pattern Recognition
卷	110
DOI	https://doi.org/10.1016/j.patcog.2020.107538
出版状态	已出版 - 2月 2021

访问文件

10.1016/j.patcog.2020.107538

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{94143bd2efe14d9e8db88adc91e51724,

title = "DecomVQANet: Decomposing visual question answering deep network via tensor decomposition and regression",

abstract = "The model we developed is a novel comprehensive solution to compress and accelerate the Visual Question Answering systems. In our algorithm Convolutional Neural Network is compressed with Long Short Term Memory to accelerate processing simultaneously. We propose to conduct various decomposition methods and regression strategies on different layers, including Canonical Polyadic, Tucker, and Tensor Train to decompose Fully Connected layers in CNN and LSTM. The Flattening Layer and Fully Connected layer at the end of the model are replaced with Tensor Regression layers. In order to compress the parameter further, the feature flow between the layers is compressed by Tensor Contraction layer. The proposed tensor decomposition model was evaluated on VQA 2.0 dataset with Pythia as baseline model. Our proposed model achieved from 77% to 91% of compression ratio, and only from 1% to 5% accuracy drop.",

keywords = "Tensor contraction layer, Tensor decomposition, Tensor regression layer, Visual question answering",

author = "Zongwen Bai and Ying Li and Marcin Wo{\'z}niak and Meili Zhou and Di Li",

note = "Publisher Copyright: {\textcopyright} 2020 Elsevier Ltd",

year = "2021",

month = feb,

doi = "10.1016/j.patcog.2020.107538",

language = "英语",

volume = "110",

journal = "Pattern Recognition",

issn = "0031-3203",

publisher = "Elsevier Ltd",

}

TY - JOUR

T1 - DecomVQANet

T2 - Decomposing visual question answering deep network via tensor decomposition and regression

AU - Bai, Zongwen

AU - Li, Ying

AU - Woźniak, Marcin

AU - Zhou, Meili

AU - Li, Di

PY - 2021/2

Y1 - 2021/2

N2 - The model we developed is a novel comprehensive solution to compress and accelerate the Visual Question Answering systems. In our algorithm Convolutional Neural Network is compressed with Long Short Term Memory to accelerate processing simultaneously. We propose to conduct various decomposition methods and regression strategies on different layers, including Canonical Polyadic, Tucker, and Tensor Train to decompose Fully Connected layers in CNN and LSTM. The Flattening Layer and Fully Connected layer at the end of the model are replaced with Tensor Regression layers. In order to compress the parameter further, the feature flow between the layers is compressed by Tensor Contraction layer. The proposed tensor decomposition model was evaluated on VQA 2.0 dataset with Pythia as baseline model. Our proposed model achieved from 77% to 91% of compression ratio, and only from 1% to 5% accuracy drop.

AB - The model we developed is a novel comprehensive solution to compress and accelerate the Visual Question Answering systems. In our algorithm Convolutional Neural Network is compressed with Long Short Term Memory to accelerate processing simultaneously. We propose to conduct various decomposition methods and regression strategies on different layers, including Canonical Polyadic, Tucker, and Tensor Train to decompose Fully Connected layers in CNN and LSTM. The Flattening Layer and Fully Connected layer at the end of the model are replaced with Tensor Regression layers. In order to compress the parameter further, the feature flow between the layers is compressed by Tensor Contraction layer. The proposed tensor decomposition model was evaluated on VQA 2.0 dataset with Pythia as baseline model. Our proposed model achieved from 77% to 91% of compression ratio, and only from 1% to 5% accuracy drop.

KW - Tensor contraction layer

KW - Tensor decomposition

KW - Tensor regression layer

KW - Visual question answering

UR - http://www.scopus.com/inward/record.url?scp=85088141265&partnerID=8YFLogxK

U2 - 10.1016/j.patcog.2020.107538

DO - 10.1016/j.patcog.2020.107538

M3 - 文章

AN - SCOPUS:85088141265

SN - 0031-3203

VL - 110

JO - Pattern Recognition

JF - Pattern Recognition

M1 - 107538

ER -

DecomVQANet: Decomposing visual question answering deep network via tensor decomposition and regression

摘要

访问文件

其它文件与链接

指纹

引用此