Transformer-based Open-world Instance Segmentation with Cross-task Consistency Regularization

Xizhe Xue; Dongdong Yu; Lingqiao Liu; Yu Liu; Satoshi Tsutsui; Ying Li; Zehuan Yuan; Ping Song; Mike Zheng Shou

doi:10.1145/3581783.3612493

Transformer-based Open-world Instance Segmentation with Cross-task Consistency Regularization

Xizhe Xue, Dongdong Yu, Lingqiao Liu, Yu Liu, Satoshi Tsutsui, Ying Li, Zehuan Yuan, Ping Song, Mike Zheng Shou

School of Computer Science

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

Abstract

Open-World Instance Segmentation (OWIS) is an emerging research topic that aims to segment class-agnostic object instances from images. The mainstream approaches use a two-stage segmentation framework, which first locates the candidate object bounding boxes and then performs instance segmentation. In this work, we instead promote a single-stage transformer-based framework for OWIS. We argue that the end-to-end training process in the single-stage framework can be more convenient for directly regularizing the localization of class-agnostic object pixels. Based on the transformer-based instance segmentation framework, we propose a regularization model to predict foreground pixels and use its relation to instance segmentation to construct a cross-task consistency loss. We show that such a consistency loss could alleviate the problem of incomplete instance annotation - a common problem in the existing OWIS datasets. We also show that the proposed loss lends itself to an effective solution to semi-supervised OWIS that could be considered an extreme case that all object annotations are absent for some images. Our extensive experiments demonstrate that the proposed method achieves impressive results in both fully-supervised and semi-supervised settings. Compared to SOTA methods, the proposed method significantly improves the AP-100 score by 4.75% in UVO dataset →UVO dataset setting and 4.05% in COCO dataset →UVO dataset setting.

Original language	English
Title of host publication	MM 2023 - Proceedings of the 31st ACM International Conference on Multimedia
Publisher	Association for Computing Machinery, Inc
Pages	2507-2515
Number of pages	9
ISBN (Electronic)	9798400701085
DOIs	https://doi.org/10.1145/3581783.3612493
State	Published - 26 Oct 2023
Event	31st ACM International Conference on Multimedia, MM 2023 - Ottawa, Canada Duration: 29 Oct 2023 → 3 Nov 2023

Publication series

Name	MM 2023 - Proceedings of the 31st ACM International Conference on Multimedia

Conference

Conference	31st ACM International Conference on Multimedia, MM 2023
Country/Territory	Canada
City	Ottawa
Period	29/10/23 → 3/11/23

Keywords

cross-task consistency
instance segmentation
open world

Access to Document

10.1145/3581783.3612493

Cite this

Xue, X., Yu, D., Liu, L., Liu, Y., Tsutsui, S., Li, Y., Yuan, Z., Song, P., & Shou, M. Z. (2023). Transformer-based Open-world Instance Segmentation with Cross-task Consistency Regularization. In MM 2023 - Proceedings of the 31st ACM International Conference on Multimedia (pp. 2507-2515). (MM 2023 - Proceedings of the 31st ACM International Conference on Multimedia). Association for Computing Machinery, Inc. https://doi.org/10.1145/3581783.3612493

@inproceedings{6eae60726cb440c5a27793e5e66dd37a,

title = "Transformer-based Open-world Instance Segmentation with Cross-task Consistency Regularization",

abstract = "Open-World Instance Segmentation (OWIS) is an emerging research topic that aims to segment class-agnostic object instances from images. The mainstream approaches use a two-stage segmentation framework, which first locates the candidate object bounding boxes and then performs instance segmentation. In this work, we instead promote a single-stage transformer-based framework for OWIS. We argue that the end-to-end training process in the single-stage framework can be more convenient for directly regularizing the localization of class-agnostic object pixels. Based on the transformer-based instance segmentation framework, we propose a regularization model to predict foreground pixels and use its relation to instance segmentation to construct a cross-task consistency loss. We show that such a consistency loss could alleviate the problem of incomplete instance annotation - a common problem in the existing OWIS datasets. We also show that the proposed loss lends itself to an effective solution to semi-supervised OWIS that could be considered an extreme case that all object annotations are absent for some images. Our extensive experiments demonstrate that the proposed method achieves impressive results in both fully-supervised and semi-supervised settings. Compared to SOTA methods, the proposed method significantly improves the AP-100 score by 4.75% in UVO dataset →UVO dataset setting and 4.05% in COCO dataset →UVO dataset setting.",

keywords = "cross-task consistency, instance segmentation, open world",

author = "Xizhe Xue and Dongdong Yu and Lingqiao Liu and Yu Liu and Satoshi Tsutsui and Ying Li and Zehuan Yuan and Ping Song and Shou, {Mike Zheng}",

note = "Publisher Copyright: {\textcopyright} 2023 ACM.; 31st ACM International Conference on Multimedia, MM 2023 ; Conference date: 29-10-2023 Through 03-11-2023",

year = "2023",

month = oct,

day = "26",

doi = "10.1145/3581783.3612493",

language = "英语",

series = "MM 2023 - Proceedings of the 31st ACM International Conference on Multimedia",

publisher = "Association for Computing Machinery, Inc",

pages = "2507--2515",

booktitle = "MM 2023 - Proceedings of the 31st ACM International Conference on Multimedia",

}

Xue, X, Yu, D, Liu, L, Liu, Y, Tsutsui, S, Li, Y, Yuan, Z, Song, P & Shou, MZ 2023, Transformer-based Open-world Instance Segmentation with Cross-task Consistency Regularization. in MM 2023 - Proceedings of the 31st ACM International Conference on Multimedia. MM 2023 - Proceedings of the 31st ACM International Conference on Multimedia, Association for Computing Machinery, Inc, pp. 2507-2515, 31st ACM International Conference on Multimedia, MM 2023, Ottawa, Canada, 29/10/23. https://doi.org/10.1145/3581783.3612493

Transformer-based Open-world Instance Segmentation with Cross-task Consistency Regularization. / Xue, Xizhe; Yu, Dongdong; Liu, Lingqiao et al.
MM 2023 - Proceedings of the 31st ACM International Conference on Multimedia. Association for Computing Machinery, Inc, 2023. p. 2507-2515 (MM 2023 - Proceedings of the 31st ACM International Conference on Multimedia).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - Transformer-based Open-world Instance Segmentation with Cross-task Consistency Regularization

AU - Xue, Xizhe

AU - Yu, Dongdong

AU - Liu, Lingqiao

AU - Liu, Yu

AU - Tsutsui, Satoshi

AU - Li, Ying

AU - Yuan, Zehuan

AU - Song, Ping

AU - Shou, Mike Zheng

PY - 2023/10/26

Y1 - 2023/10/26

N2 - Open-World Instance Segmentation (OWIS) is an emerging research topic that aims to segment class-agnostic object instances from images. The mainstream approaches use a two-stage segmentation framework, which first locates the candidate object bounding boxes and then performs instance segmentation. In this work, we instead promote a single-stage transformer-based framework for OWIS. We argue that the end-to-end training process in the single-stage framework can be more convenient for directly regularizing the localization of class-agnostic object pixels. Based on the transformer-based instance segmentation framework, we propose a regularization model to predict foreground pixels and use its relation to instance segmentation to construct a cross-task consistency loss. We show that such a consistency loss could alleviate the problem of incomplete instance annotation - a common problem in the existing OWIS datasets. We also show that the proposed loss lends itself to an effective solution to semi-supervised OWIS that could be considered an extreme case that all object annotations are absent for some images. Our extensive experiments demonstrate that the proposed method achieves impressive results in both fully-supervised and semi-supervised settings. Compared to SOTA methods, the proposed method significantly improves the AP-100 score by 4.75% in UVO dataset →UVO dataset setting and 4.05% in COCO dataset →UVO dataset setting.

AB - Open-World Instance Segmentation (OWIS) is an emerging research topic that aims to segment class-agnostic object instances from images. The mainstream approaches use a two-stage segmentation framework, which first locates the candidate object bounding boxes and then performs instance segmentation. In this work, we instead promote a single-stage transformer-based framework for OWIS. We argue that the end-to-end training process in the single-stage framework can be more convenient for directly regularizing the localization of class-agnostic object pixels. Based on the transformer-based instance segmentation framework, we propose a regularization model to predict foreground pixels and use its relation to instance segmentation to construct a cross-task consistency loss. We show that such a consistency loss could alleviate the problem of incomplete instance annotation - a common problem in the existing OWIS datasets. We also show that the proposed loss lends itself to an effective solution to semi-supervised OWIS that could be considered an extreme case that all object annotations are absent for some images. Our extensive experiments demonstrate that the proposed method achieves impressive results in both fully-supervised and semi-supervised settings. Compared to SOTA methods, the proposed method significantly improves the AP-100 score by 4.75% in UVO dataset →UVO dataset setting and 4.05% in COCO dataset →UVO dataset setting.

KW - cross-task consistency

KW - instance segmentation

KW - open world

UR - http://www.scopus.com/inward/record.url?scp=85179555186&partnerID=8YFLogxK

U2 - 10.1145/3581783.3612493

DO - 10.1145/3581783.3612493

M3 - 会议稿件

AN - SCOPUS:85179555186

T3 - MM 2023 - Proceedings of the 31st ACM International Conference on Multimedia

SP - 2507

EP - 2515

BT - MM 2023 - Proceedings of the 31st ACM International Conference on Multimedia

PB - Association for Computing Machinery, Inc

T2 - 31st ACM International Conference on Multimedia, MM 2023

Y2 - 29 October 2023 through 3 November 2023

ER -

Xue X, Yu D, Liu L, Liu Y, Tsutsui S, Li Y et al. Transformer-based Open-world Instance Segmentation with Cross-task Consistency Regularization. In MM 2023 - Proceedings of the 31st ACM International Conference on Multimedia. Association for Computing Machinery, Inc. 2023. p. 2507-2515. (MM 2023 - Proceedings of the 31st ACM International Conference on Multimedia). doi: 10.1145/3581783.3612493

Transformer-based Open-world Instance Segmentation with Cross-task Consistency Regularization

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this