DecomVQANet: Decomposing visual question answering deep network via tensor decomposition and regression

Zongwen Bai, Ying Li, Marcin Woźniak, Meili Zhou, Di Li

科研成果: 期刊稿件文章同行评审

71 引用 (Scopus)

摘要

The model we developed is a novel comprehensive solution to compress and accelerate the Visual Question Answering systems. In our algorithm Convolutional Neural Network is compressed with Long Short Term Memory to accelerate processing simultaneously. We propose to conduct various decomposition methods and regression strategies on different layers, including Canonical Polyadic, Tucker, and Tensor Train to decompose Fully Connected layers in CNN and LSTM. The Flattening Layer and Fully Connected layer at the end of the model are replaced with Tensor Regression layers. In order to compress the parameter further, the feature flow between the layers is compressed by Tensor Contraction layer. The proposed tensor decomposition model was evaluated on VQA 2.0 dataset with Pythia as baseline model. Our proposed model achieved from 77% to 91% of compression ratio, and only from 1% to 5% accuracy drop.

源语言英语
文章编号107538
期刊Pattern Recognition
110
DOI
出版状态已出版 - 2月 2021

指纹

探究 'DecomVQANet: Decomposing visual question answering deep network via tensor decomposition and regression' 的科研主题。它们共同构成独一无二的指纹。

引用此