TY - JOUR
T1 - Segmentation in Weakly Labeled Videos via a Semantic Ranking and Optical Warping Network
AU - Yang, Le
AU - Han, Junwei
AU - Zhang, Dingwen
AU - Liu, Nian
AU - Zhang, Dong
N1 - Publisher Copyright:
© 1992-2012 IEEE.
PY - 2018/8
Y1 - 2018/8
N2 - Weakly supervised video object segmentation (WSVOS) focuses on generating pixel-level object masks for videos only tagged with class labels, which is an essential yet challenging task. For WSVOS, the algorithm is just aware of rough category information rather than the concrete object size and location cues, besides it lacks reliable annotated exemplars to learn temporal evolution in the investigated videos. Basically, there are three challenging factors which may influence the performance of WSVOS: foreground object discovery in each frame, coarse object semantic consistency within each video, and fine-grained segmentation smoothness within neighbor frames. In this paper, we establish a semantic ranking and optical warping network to simultaneously solve these three challenges in a unified framework. For the first challenge, we apply the still image saliency detection method and discover the foreground object for each frame via a segmentation network. Due to the huge discrepancies between the image saliency and the video object segmentation, we step further and propose two subnetworks to solve the other two challenges. For the second one, we propose an attentive semantic ranking subnetwork to mine video-level tags, which can learn discriminative features for semantic ranking and lead to semantic consistent segmentation masks. For the third one, we propose an optical flow warping subnetwork to constrain fine-grained segmentation smoothness within neighbor frames, which can suppress the large deformation and thus obtain smooth object boundaries for adjacent frames. Experiments on two benchmark data sets, i.e., DAVIS data set and YouTube-Objects data set, demonstrate the effectiveness of the proposed approach for segmenting out video objects under weak supervision.
AB - Weakly supervised video object segmentation (WSVOS) focuses on generating pixel-level object masks for videos only tagged with class labels, which is an essential yet challenging task. For WSVOS, the algorithm is just aware of rough category information rather than the concrete object size and location cues, besides it lacks reliable annotated exemplars to learn temporal evolution in the investigated videos. Basically, there are three challenging factors which may influence the performance of WSVOS: foreground object discovery in each frame, coarse object semantic consistency within each video, and fine-grained segmentation smoothness within neighbor frames. In this paper, we establish a semantic ranking and optical warping network to simultaneously solve these three challenges in a unified framework. For the first challenge, we apply the still image saliency detection method and discover the foreground object for each frame via a segmentation network. Due to the huge discrepancies between the image saliency and the video object segmentation, we step further and propose two subnetworks to solve the other two challenges. For the second one, we propose an attentive semantic ranking subnetwork to mine video-level tags, which can learn discriminative features for semantic ranking and lead to semantic consistent segmentation masks. For the third one, we propose an optical flow warping subnetwork to constrain fine-grained segmentation smoothness within neighbor frames, which can suppress the large deformation and thus obtain smooth object boundaries for adjacent frames. Experiments on two benchmark data sets, i.e., DAVIS data set and YouTube-Objects data set, demonstrate the effectiveness of the proposed approach for segmenting out video objects under weak supervision.
KW - optical warping
KW - semantic ranking
KW - Video object segmentation
KW - weak supervision
UR - http://www.scopus.com/inward/record.url?scp=85047021095&partnerID=8YFLogxK
U2 - 10.1109/TIP.2018.2834221
DO - 10.1109/TIP.2018.2834221
M3 - 文章
AN - SCOPUS:85047021095
SN - 1057-7149
VL - 27
SP - 4025
EP - 4037
JO - IEEE Transactions on Image Processing
JF - IEEE Transactions on Image Processing
IS - 8
ER -