SITS-Former: A pre-trained spatio-spectral-temporal representation model for Sentinel-2 time series classification

Yuan Yuan; Lei Lin; Qingshan Liu; Renlong Hang; Zeng Guang Zhou

doi:10.1016/j.jag.2021.102651

SITS-Former: A pre-trained spatio-spectral-temporal representation model for Sentinel-2 time series classification

Yuan Yuan, Lei Lin, Qingshan Liu, Renlong Hang, Zeng Guang Zhou

科研成果: 期刊稿件 › 文章 › 同行评审

69 引用（Scopus）

摘要

Sentinel-2 images provide a rich source of information for a variety of land cover, vegetation, and environmental monitoring applications due to their high spectral, spatial, and temporal resolutions. Recently, deep learning-based classification of Sentinel-2 time series becomes a popular solution to vegetation classification and land cover mapping, but it often demands a large number of manually annotated labels. Improving classification performance with limited labeled data is still a challenge in many real-world remote sensing applications. To address label scarcity, we present SITS-Former (SITS stands for Satellite Image Time Series and Former stands for Transformer), a pre-trained representation model for Sentinel-2 time series classification. SITS-Former adopts a Transformer encoder as the backbone and takes time series of image patches as input to learn spatio-spectral-temporal features. According to the principles of self-supervised learning, we pre-train SITS-Former on massive unlabeled Sentinel-2 time series via a missing-data imputation proxy task. Given an incomplete time series with some patches being masked randomly, the network is asked to regress the central pixels of these masked patches based on the residual ones. By doing so, the network can capture high-level spatial and temporal dependencies from the data to learn discriminative features. After pre-training, the network can adapt the learned features to a target classification task through fine-tuning. As far as we know, this is the first study that exploits self-supervised learning for patch-based representation learning and classification of SITS. We quantitatively evaluate the quality of the learned features by transferring them on two crop classification tasks, showing that SITS-Former outperforms state-of-the-art approaches and yields a significant improvement (2.64%∼3.30% in overall accuracy) over the purely supervised model. The proposed model provides an effective tool for SITS-related applications as it greatly reduces the burden of manual labeling. The source code will be released at https://github.com/linlei1214/SITS-Former upon publication.

源语言	英语
文章编号	102651
期刊	International Journal of Applied Earth Observation and Geoinformation
卷	106
DOI	https://doi.org/10.1016/j.jag.2021.102651
出版状态	已出版 - 2月 2022
已对外发布	是

访问文件

10.1016/j.jag.2021.102651

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{3b06f54522374073bcb0888ae9a85966,

title = "SITS-Former: A pre-trained spatio-spectral-temporal representation model for Sentinel-2 time series classification",

abstract = "Sentinel-2 images provide a rich source of information for a variety of land cover, vegetation, and environmental monitoring applications due to their high spectral, spatial, and temporal resolutions. Recently, deep learning-based classification of Sentinel-2 time series becomes a popular solution to vegetation classification and land cover mapping, but it often demands a large number of manually annotated labels. Improving classification performance with limited labeled data is still a challenge in many real-world remote sensing applications. To address label scarcity, we present SITS-Former (SITS stands for Satellite Image Time Series and Former stands for Transformer), a pre-trained representation model for Sentinel-2 time series classification. SITS-Former adopts a Transformer encoder as the backbone and takes time series of image patches as input to learn spatio-spectral-temporal features. According to the principles of self-supervised learning, we pre-train SITS-Former on massive unlabeled Sentinel-2 time series via a missing-data imputation proxy task. Given an incomplete time series with some patches being masked randomly, the network is asked to regress the central pixels of these masked patches based on the residual ones. By doing so, the network can capture high-level spatial and temporal dependencies from the data to learn discriminative features. After pre-training, the network can adapt the learned features to a target classification task through fine-tuning. As far as we know, this is the first study that exploits self-supervised learning for patch-based representation learning and classification of SITS. We quantitatively evaluate the quality of the learned features by transferring them on two crop classification tasks, showing that SITS-Former outperforms state-of-the-art approaches and yields a significant improvement (2.64%∼3.30% in overall accuracy) over the purely supervised model. The proposed model provides an effective tool for SITS-related applications as it greatly reduces the burden of manual labeling. The source code will be released at https://github.com/linlei1214/SITS-Former upon publication.",

keywords = "Pre-training, Self-supervised learning, Sentinel-2, Transformer, satellite image time series (SITS)",

author = "Yuan Yuan and Lei Lin and Qingshan Liu and Renlong Hang and Zhou, {Zeng Guang}",

note = "Publisher Copyright: {\textcopyright} 2021 The Authors",

year = "2022",

month = feb,

doi = "10.1016/j.jag.2021.102651",

language = "英语",

volume = "106",

journal = "International Journal of Applied Earth Observation and Geoinformation",

issn = "1569-8432",

publisher = "Elsevier B.V.",

}

TY - JOUR

T1 - SITS-Former

T2 - A pre-trained spatio-spectral-temporal representation model for Sentinel-2 time series classification

AU - Yuan, Yuan

AU - Lin, Lei

AU - Liu, Qingshan

AU - Hang, Renlong

AU - Zhou, Zeng Guang

PY - 2022/2

Y1 - 2022/2

N2 - Sentinel-2 images provide a rich source of information for a variety of land cover, vegetation, and environmental monitoring applications due to their high spectral, spatial, and temporal resolutions. Recently, deep learning-based classification of Sentinel-2 time series becomes a popular solution to vegetation classification and land cover mapping, but it often demands a large number of manually annotated labels. Improving classification performance with limited labeled data is still a challenge in many real-world remote sensing applications. To address label scarcity, we present SITS-Former (SITS stands for Satellite Image Time Series and Former stands for Transformer), a pre-trained representation model for Sentinel-2 time series classification. SITS-Former adopts a Transformer encoder as the backbone and takes time series of image patches as input to learn spatio-spectral-temporal features. According to the principles of self-supervised learning, we pre-train SITS-Former on massive unlabeled Sentinel-2 time series via a missing-data imputation proxy task. Given an incomplete time series with some patches being masked randomly, the network is asked to regress the central pixels of these masked patches based on the residual ones. By doing so, the network can capture high-level spatial and temporal dependencies from the data to learn discriminative features. After pre-training, the network can adapt the learned features to a target classification task through fine-tuning. As far as we know, this is the first study that exploits self-supervised learning for patch-based representation learning and classification of SITS. We quantitatively evaluate the quality of the learned features by transferring them on two crop classification tasks, showing that SITS-Former outperforms state-of-the-art approaches and yields a significant improvement (2.64%∼3.30% in overall accuracy) over the purely supervised model. The proposed model provides an effective tool for SITS-related applications as it greatly reduces the burden of manual labeling. The source code will be released at https://github.com/linlei1214/SITS-Former upon publication.

AB - Sentinel-2 images provide a rich source of information for a variety of land cover, vegetation, and environmental monitoring applications due to their high spectral, spatial, and temporal resolutions. Recently, deep learning-based classification of Sentinel-2 time series becomes a popular solution to vegetation classification and land cover mapping, but it often demands a large number of manually annotated labels. Improving classification performance with limited labeled data is still a challenge in many real-world remote sensing applications. To address label scarcity, we present SITS-Former (SITS stands for Satellite Image Time Series and Former stands for Transformer), a pre-trained representation model for Sentinel-2 time series classification. SITS-Former adopts a Transformer encoder as the backbone and takes time series of image patches as input to learn spatio-spectral-temporal features. According to the principles of self-supervised learning, we pre-train SITS-Former on massive unlabeled Sentinel-2 time series via a missing-data imputation proxy task. Given an incomplete time series with some patches being masked randomly, the network is asked to regress the central pixels of these masked patches based on the residual ones. By doing so, the network can capture high-level spatial and temporal dependencies from the data to learn discriminative features. After pre-training, the network can adapt the learned features to a target classification task through fine-tuning. As far as we know, this is the first study that exploits self-supervised learning for patch-based representation learning and classification of SITS. We quantitatively evaluate the quality of the learned features by transferring them on two crop classification tasks, showing that SITS-Former outperforms state-of-the-art approaches and yields a significant improvement (2.64%∼3.30% in overall accuracy) over the purely supervised model. The proposed model provides an effective tool for SITS-related applications as it greatly reduces the burden of manual labeling. The source code will be released at https://github.com/linlei1214/SITS-Former upon publication.

KW - Pre-training

KW - Self-supervised learning

KW - Sentinel-2

KW - Transformer

KW - satellite image time series (SITS)

UR - http://www.scopus.com/inward/record.url?scp=85122543841&partnerID=8YFLogxK

U2 - 10.1016/j.jag.2021.102651

DO - 10.1016/j.jag.2021.102651

M3 - 文章

AN - SCOPUS:85122543841

SN - 1569-8432

VL - 106

JO - International Journal of Applied Earth Observation and Geoinformation

JF - International Journal of Applied Earth Observation and Geoinformation

M1 - 102651

ER -

SITS-Former: A pre-trained spatio-spectral-temporal representation model for Sentinel-2 time series classification

摘要

访问文件

其它文件与链接

指纹

引用此