S3C: Semi-Supervised VQA Natural Language Explanation via Self-Critical Learning

Wei Suo; Mengyang Sun; Weisong Liu; Yiqi Gao; Peng Wang; Yanning Zhang; Qi Wu

doi:10.1109/CVPR52729.2023.00260

S³C: Semi-Supervised VQA Natural Language Explanation via Self-Critical Learning

Wei Suo, Mengyang Sun, Weisong Liu, Yiqi Gao, Peng Wang, Yanning Zhang, Qi Wu

School of Computer Science

Research output: Contribution to journal › Conference article › peer-review

6 Scopus citations

Abstract

VQA Natural Language Explanation (VQA-NLE) task aims to explain the decision-making process of VQA models in natural language. Unlike traditional attention or gradient analysis, free-text rationales can be easier to understand and gain users' trust. Existing methods mostly use post-hoc or selfrationalization models to obtain a plausible explanation. However, these frameworks are bottle-necked by the following challenges: 1) the reasoning process cannot be faithfully responded to and suffer from the problem of logical inconsistency. 2) Human-annotated explanations are expensive and time-consuming to collect. In this paper, we propose a new Semi-Supervised VQA-NLE via Self-Critical Learning (S³C), which evaluates the candidate explanations by answering rewards to improve the logical consistency between answers and rationales. With a semi-supervised learning framework, the S³C can benefit from a tremendous amount of samples without human-annotated explanations. A large number of automatic measures and human evaluations all show the effectiveness of our method. Meanwhile, the framework achieves a new state-of-the-art performance on the two VQA-NLE datasets.

Original language	English
Pages (from-to)	2646-2656
Number of pages	11
Journal	Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
Volume	2023-June
DOIs	https://doi.org/10.1109/CVPR52729.2023.00260
State	Published - 2023
Event	2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2023 - Vancouver, Canada Duration: 18 Jun 2023 → 22 Jun 2023

Keywords

Vision
and reasoning
language

Access to Document

10.1109/CVPR52729.2023.00260

Cite this

@article{aa7b4f92217f4a8d954887e42c2ad873,

title = "S3C: Semi-Supervised VQA Natural Language Explanation via Self-Critical Learning",

abstract = "VQA Natural Language Explanation (VQA-NLE) task aims to explain the decision-making process of VQA models in natural language. Unlike traditional attention or gradient analysis, free-text rationales can be easier to understand and gain users' trust. Existing methods mostly use post-hoc or selfrationalization models to obtain a plausible explanation. However, these frameworks are bottle-necked by the following challenges: 1) the reasoning process cannot be faithfully responded to and suffer from the problem of logical inconsistency. 2) Human-annotated explanations are expensive and time-consuming to collect. In this paper, we propose a new Semi-Supervised VQA-NLE via Self-Critical Learning (S3C), which evaluates the candidate explanations by answering rewards to improve the logical consistency between answers and rationales. With a semi-supervised learning framework, the S3C can benefit from a tremendous amount of samples without human-annotated explanations. A large number of automatic measures and human evaluations all show the effectiveness of our method. Meanwhile, the framework achieves a new state-of-the-art performance on the two VQA-NLE datasets.",

keywords = "Vision, and reasoning, language",

author = "Wei Suo and Mengyang Sun and Weisong Liu and Yiqi Gao and Peng Wang and Yanning Zhang and Qi Wu",

note = "Publisher Copyright: {\textcopyright} 2023 IEEE.; 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2023 ; Conference date: 18-06-2023 Through 22-06-2023",

year = "2023",

doi = "10.1109/CVPR52729.2023.00260",

language = "英语",

volume = "2023-June",

pages = "2646--2656",

journal = "Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition",

issn = "1063-6919",

publisher = "IEEE Computer Society",

}

TY - JOUR

T1 - S3C

T2 - 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2023

AU - Suo, Wei

AU - Sun, Mengyang

AU - Liu, Weisong

AU - Gao, Yiqi

AU - Wang, Peng

AU - Zhang, Yanning

AU - Wu, Qi

PY - 2023

Y1 - 2023

N2 - VQA Natural Language Explanation (VQA-NLE) task aims to explain the decision-making process of VQA models in natural language. Unlike traditional attention or gradient analysis, free-text rationales can be easier to understand and gain users' trust. Existing methods mostly use post-hoc or selfrationalization models to obtain a plausible explanation. However, these frameworks are bottle-necked by the following challenges: 1) the reasoning process cannot be faithfully responded to and suffer from the problem of logical inconsistency. 2) Human-annotated explanations are expensive and time-consuming to collect. In this paper, we propose a new Semi-Supervised VQA-NLE via Self-Critical Learning (S3C), which evaluates the candidate explanations by answering rewards to improve the logical consistency between answers and rationales. With a semi-supervised learning framework, the S3C can benefit from a tremendous amount of samples without human-annotated explanations. A large number of automatic measures and human evaluations all show the effectiveness of our method. Meanwhile, the framework achieves a new state-of-the-art performance on the two VQA-NLE datasets.

AB - VQA Natural Language Explanation (VQA-NLE) task aims to explain the decision-making process of VQA models in natural language. Unlike traditional attention or gradient analysis, free-text rationales can be easier to understand and gain users' trust. Existing methods mostly use post-hoc or selfrationalization models to obtain a plausible explanation. However, these frameworks are bottle-necked by the following challenges: 1) the reasoning process cannot be faithfully responded to and suffer from the problem of logical inconsistency. 2) Human-annotated explanations are expensive and time-consuming to collect. In this paper, we propose a new Semi-Supervised VQA-NLE via Self-Critical Learning (S3C), which evaluates the candidate explanations by answering rewards to improve the logical consistency between answers and rationales. With a semi-supervised learning framework, the S3C can benefit from a tremendous amount of samples without human-annotated explanations. A large number of automatic measures and human evaluations all show the effectiveness of our method. Meanwhile, the framework achieves a new state-of-the-art performance on the two VQA-NLE datasets.

KW - Vision

KW - and reasoning

KW - language

UR - http://www.scopus.com/inward/record.url?scp=85189313913&partnerID=8YFLogxK

U2 - 10.1109/CVPR52729.2023.00260

DO - 10.1109/CVPR52729.2023.00260

M3 - 会议文章

AN - SCOPUS:85189313913

SN - 1063-6919

VL - 2023-June

SP - 2646

EP - 2656

JO - Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition

JF - Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition

Y2 - 18 June 2023 through 22 June 2023

ER -

S³C: Semi-Supervised VQA Natural Language Explanation via Self-Critical Learning

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this