S2Net: A Multitask Learning Network for Semantic Stereo of Satellite Image Pairs

Puyun Liao; Xiaodong Zhang; Guanzhou Chen; Tong Wang; Xianwei Li; Haobo Yang; Wenlin Zhou; Chanjuan He; Qing Wang

doi:10.1109/TGRS.2023.3335997

S²Net: A Multitask Learning Network for Semantic Stereo of Satellite Image Pairs

Puyun Liao, Xiaodong Zhang, Guanzhou Chen, Tong Wang, Xianwei Li, Haobo Yang, Wenlin Zhou, Chanjuan He, Qing Wang

科研成果: 期刊稿件 › 文章 › 同行评审

6 引用（Scopus）

摘要

Stereo matching and semantic segmentation are two significant tasks in remote sensing. Recently, deep learning approaches have been applied to these tasks separately. However, the lack of semantic supervision makes the training of stereo matching model susceptible to data disturbance, resulting in inferior generalization ability; foreground objects are sometimes confused with background pixels in RGB images, limiting the classification accuracy. By exploring the relationship between these two tasks, semantic stereo solves these problems simultaneously with multitask learning. Previous methods took semantic stereo as two parallel processing tasks, so they did not take full advantages of the additional information from both tasks and only obtained slight improvement. In this work, we designed a multitask learning framework semantic stereo network (S2Net). The proposed network generates cost volumes with feature maps supervised by semantic information to estimate disparity maps and fuses RGB-D feature maps to predict classification maps, therefore gathering multitask learning information. To enhance the performance of trained model, we also considered the continuity of disparity values and the duality of stereo image pairs in data augmentation. When applied in datasets without training, S2Net obtained 2.937% D1-Error in the WHU dataset, lower than 4.297% of the previous best method, depicting the generalization ability improvement from semantic supervision. In terms of semantic segmentation, the introduction of disparity maps increases the mean intersection over union (mIoU) from 61.375% to 69.096% in the US3D datasets. The experiments on the KITTI semantics benchmark show that our proposed method obtains 60.76% mIoU, achieving state-of-the-art among multitask learning methods .

源语言	英语
文章编号	5601313
页（从-至）	1-13
页数	13
期刊	IEEE Transactions on Geoscience and Remote Sensing
卷	62
DOI	https://doi.org/10.1109/TGRS.2023.3335997
出版状态	已出版 - 2024
已对外发布	是

访问文件

10.1109/TGRS.2023.3335997

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{a2b258ef66364eaa81e0b806c019c271,

title = "S2Net: A Multitask Learning Network for Semantic Stereo of Satellite Image Pairs",

abstract = "Stereo matching and semantic segmentation are two significant tasks in remote sensing. Recently, deep learning approaches have been applied to these tasks separately. However, the lack of semantic supervision makes the training of stereo matching model susceptible to data disturbance, resulting in inferior generalization ability; foreground objects are sometimes confused with background pixels in RGB images, limiting the classification accuracy. By exploring the relationship between these two tasks, semantic stereo solves these problems simultaneously with multitask learning. Previous methods took semantic stereo as two parallel processing tasks, so they did not take full advantages of the additional information from both tasks and only obtained slight improvement. In this work, we designed a multitask learning framework semantic stereo network (S2Net). The proposed network generates cost volumes with feature maps supervised by semantic information to estimate disparity maps and fuses RGB-D feature maps to predict classification maps, therefore gathering multitask learning information. To enhance the performance of trained model, we also considered the continuity of disparity values and the duality of stereo image pairs in data augmentation. When applied in datasets without training, S2Net obtained 2.937% D1-Error in the WHU dataset, lower than 4.297% of the previous best method, depicting the generalization ability improvement from semantic supervision. In terms of semantic segmentation, the introduction of disparity maps increases the mean intersection over union (mIoU) from 61.375% to 69.096% in the US3D datasets. The experiments on the KITTI semantics benchmark show that our proposed method obtains 60.76% mIoU, achieving state-of-the-art among multitask learning methods .",

keywords = "Convolutional neural network, multitask learning, semantic segmentation, stereo image pairs, stereo matching",

author = "Puyun Liao and Xiaodong Zhang and Guanzhou Chen and Tong Wang and Xianwei Li and Haobo Yang and Wenlin Zhou and Chanjuan He and Qing Wang",

note = "Publisher Copyright: {\textcopyright} 1980-2012 IEEE.",

year = "2024",

doi = "10.1109/TGRS.2023.3335997",

language = "英语",

volume = "62",

pages = "1--13",

journal = "IEEE Transactions on Geoscience and Remote Sensing",

issn = "0196-2892",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - JOUR

T1 - S2Net

T2 - A Multitask Learning Network for Semantic Stereo of Satellite Image Pairs

AU - Liao, Puyun

AU - Zhang, Xiaodong

AU - Chen, Guanzhou

AU - Wang, Tong

AU - Li, Xianwei

AU - Yang, Haobo

AU - Zhou, Wenlin

AU - He, Chanjuan

AU - Wang, Qing

PY - 2024

Y1 - 2024

N2 - Stereo matching and semantic segmentation are two significant tasks in remote sensing. Recently, deep learning approaches have been applied to these tasks separately. However, the lack of semantic supervision makes the training of stereo matching model susceptible to data disturbance, resulting in inferior generalization ability; foreground objects are sometimes confused with background pixels in RGB images, limiting the classification accuracy. By exploring the relationship between these two tasks, semantic stereo solves these problems simultaneously with multitask learning. Previous methods took semantic stereo as two parallel processing tasks, so they did not take full advantages of the additional information from both tasks and only obtained slight improvement. In this work, we designed a multitask learning framework semantic stereo network (S2Net). The proposed network generates cost volumes with feature maps supervised by semantic information to estimate disparity maps and fuses RGB-D feature maps to predict classification maps, therefore gathering multitask learning information. To enhance the performance of trained model, we also considered the continuity of disparity values and the duality of stereo image pairs in data augmentation. When applied in datasets without training, S2Net obtained 2.937% D1-Error in the WHU dataset, lower than 4.297% of the previous best method, depicting the generalization ability improvement from semantic supervision. In terms of semantic segmentation, the introduction of disparity maps increases the mean intersection over union (mIoU) from 61.375% to 69.096% in the US3D datasets. The experiments on the KITTI semantics benchmark show that our proposed method obtains 60.76% mIoU, achieving state-of-the-art among multitask learning methods .

AB - Stereo matching and semantic segmentation are two significant tasks in remote sensing. Recently, deep learning approaches have been applied to these tasks separately. However, the lack of semantic supervision makes the training of stereo matching model susceptible to data disturbance, resulting in inferior generalization ability; foreground objects are sometimes confused with background pixels in RGB images, limiting the classification accuracy. By exploring the relationship between these two tasks, semantic stereo solves these problems simultaneously with multitask learning. Previous methods took semantic stereo as two parallel processing tasks, so they did not take full advantages of the additional information from both tasks and only obtained slight improvement. In this work, we designed a multitask learning framework semantic stereo network (S2Net). The proposed network generates cost volumes with feature maps supervised by semantic information to estimate disparity maps and fuses RGB-D feature maps to predict classification maps, therefore gathering multitask learning information. To enhance the performance of trained model, we also considered the continuity of disparity values and the duality of stereo image pairs in data augmentation. When applied in datasets without training, S2Net obtained 2.937% D1-Error in the WHU dataset, lower than 4.297% of the previous best method, depicting the generalization ability improvement from semantic supervision. In terms of semantic segmentation, the introduction of disparity maps increases the mean intersection over union (mIoU) from 61.375% to 69.096% in the US3D datasets. The experiments on the KITTI semantics benchmark show that our proposed method obtains 60.76% mIoU, achieving state-of-the-art among multitask learning methods .

KW - Convolutional neural network

KW - multitask learning

KW - semantic segmentation

KW - stereo image pairs

KW - stereo matching

UR - http://www.scopus.com/inward/record.url?scp=85179063378&partnerID=8YFLogxK

U2 - 10.1109/TGRS.2023.3335997

DO - 10.1109/TGRS.2023.3335997

M3 - 文章

AN - SCOPUS:85179063378

SN - 0196-2892

VL - 62

SP - 1

EP - 13

JO - IEEE Transactions on Geoscience and Remote Sensing

JF - IEEE Transactions on Geoscience and Remote Sensing

M1 - 5601313

ER -

S2Net: A Multitask Learning Network for Semantic Stereo of Satellite Image Pairs

摘要

访问文件

其它文件与链接

指纹

引用此

S²Net: A Multitask Learning Network for Semantic Stereo of Satellite Image Pairs