Gated forward refinement network for action segmentation

Dong Wang; Yuan Yuan; Qi Wang

doi:10.1016/j.neucom.2020.03.066

Gated forward refinement network for action segmentation

Dong Wang, Yuan Yuan, Qi Wang

光电与智能研究院

Northwestern Polytechnical University Xian

科研成果: 期刊稿件 › 文章 › 同行评审

18 引用（Scopus）

摘要

Action segmentation aims at temporally locating and classifying video segments in long untrimmed videos, which is of particular interest to many applications like surveillance and robotics. While most existing methods tackle this task by predicting frame-wise probabilities and adjusting them via high-level temporal models, recent approaches classify every video frame directly with temporal convolutions. However, there are limits to generate high quality predictions due to ambiguous information in the video frames. In this paper, in order to address the limitations of existing methods in temporal action segmentation task, we propose an end-to-end multi-stage architecture, Gated Forward Refinement Network (G-FRNet). In G-FRNet, each stage makes a prediction that is refined progressively by next stage. Specifically, we propose a new gated forward refinement network to adaptively correct the errors in the prediction from previous stage, where an effective gate unit is used to control the refinement process. Moreover, to efficiently optimize the proposed G-FRNet, we design an objective function that consists of a classification loss and a multi-stage sequence-level refinement loss that incorporates segmental edit score via policy gradient. Extensive evaluation on three challenging datasets (50Salads, Georgia Tech Egocentric Activities (GTEA), and the Breakfast dataset) shows our method achieves state-of-the-art results.

源语言	英语
页（从-至）	63-71
页数	9
期刊	Neurocomputing
卷	407
DOI	https://doi.org/10.1016/j.neucom.2020.03.066
出版状态	已出版 - 24 9月 2020

访问文件

10.1016/j.neucom.2020.03.066

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{dceccc3130ea444dbf75f4e08a04da5f,

title = "Gated forward refinement network for action segmentation",

abstract = "Action segmentation aims at temporally locating and classifying video segments in long untrimmed videos, which is of particular interest to many applications like surveillance and robotics. While most existing methods tackle this task by predicting frame-wise probabilities and adjusting them via high-level temporal models, recent approaches classify every video frame directly with temporal convolutions. However, there are limits to generate high quality predictions due to ambiguous information in the video frames. In this paper, in order to address the limitations of existing methods in temporal action segmentation task, we propose an end-to-end multi-stage architecture, Gated Forward Refinement Network (G-FRNet). In G-FRNet, each stage makes a prediction that is refined progressively by next stage. Specifically, we propose a new gated forward refinement network to adaptively correct the errors in the prediction from previous stage, where an effective gate unit is used to control the refinement process. Moreover, to efficiently optimize the proposed G-FRNet, we design an objective function that consists of a classification loss and a multi-stage sequence-level refinement loss that incorporates segmental edit score via policy gradient. Extensive evaluation on three challenging datasets (50Salads, Georgia Tech Egocentric Activities (GTEA), and the Breakfast dataset) shows our method achieves state-of-the-art results.",

keywords = "Action segmentation, Policy gradient, Refinement network, Video analysis",

author = "Dong Wang and Yuan Yuan and Qi Wang",

note = "Publisher Copyright: {\textcopyright} 2020",

year = "2020",

month = sep,

day = "24",

doi = "10.1016/j.neucom.2020.03.066",

language = "英语",

volume = "407",

pages = "63--71",

journal = "Neurocomputing",

issn = "0925-2312",

publisher = "Elsevier B.V.",

}

TY - JOUR

T1 - Gated forward refinement network for action segmentation

AU - Wang, Dong

AU - Yuan, Yuan

AU - Wang, Qi

PY - 2020/9/24

Y1 - 2020/9/24

N2 - Action segmentation aims at temporally locating and classifying video segments in long untrimmed videos, which is of particular interest to many applications like surveillance and robotics. While most existing methods tackle this task by predicting frame-wise probabilities and adjusting them via high-level temporal models, recent approaches classify every video frame directly with temporal convolutions. However, there are limits to generate high quality predictions due to ambiguous information in the video frames. In this paper, in order to address the limitations of existing methods in temporal action segmentation task, we propose an end-to-end multi-stage architecture, Gated Forward Refinement Network (G-FRNet). In G-FRNet, each stage makes a prediction that is refined progressively by next stage. Specifically, we propose a new gated forward refinement network to adaptively correct the errors in the prediction from previous stage, where an effective gate unit is used to control the refinement process. Moreover, to efficiently optimize the proposed G-FRNet, we design an objective function that consists of a classification loss and a multi-stage sequence-level refinement loss that incorporates segmental edit score via policy gradient. Extensive evaluation on three challenging datasets (50Salads, Georgia Tech Egocentric Activities (GTEA), and the Breakfast dataset) shows our method achieves state-of-the-art results.

AB - Action segmentation aims at temporally locating and classifying video segments in long untrimmed videos, which is of particular interest to many applications like surveillance and robotics. While most existing methods tackle this task by predicting frame-wise probabilities and adjusting them via high-level temporal models, recent approaches classify every video frame directly with temporal convolutions. However, there are limits to generate high quality predictions due to ambiguous information in the video frames. In this paper, in order to address the limitations of existing methods in temporal action segmentation task, we propose an end-to-end multi-stage architecture, Gated Forward Refinement Network (G-FRNet). In G-FRNet, each stage makes a prediction that is refined progressively by next stage. Specifically, we propose a new gated forward refinement network to adaptively correct the errors in the prediction from previous stage, where an effective gate unit is used to control the refinement process. Moreover, to efficiently optimize the proposed G-FRNet, we design an objective function that consists of a classification loss and a multi-stage sequence-level refinement loss that incorporates segmental edit score via policy gradient. Extensive evaluation on three challenging datasets (50Salads, Georgia Tech Egocentric Activities (GTEA), and the Breakfast dataset) shows our method achieves state-of-the-art results.

KW - Action segmentation

KW - Policy gradient

KW - Refinement network

KW - Video analysis

UR - http://www.scopus.com/inward/record.url?scp=85085271607&partnerID=8YFLogxK

U2 - 10.1016/j.neucom.2020.03.066

DO - 10.1016/j.neucom.2020.03.066

M3 - 文章

AN - SCOPUS:85085271607

SN - 0925-2312

VL - 407

SP - 63

EP - 71

JO - Neurocomputing

JF - Neurocomputing

ER -

Gated forward refinement network for action segmentation

摘要

访问文件

其它文件与链接

指纹

引用此