TY - JOUR
T1 - SITS-Former
T2 - A pre-trained spatio-spectral-temporal representation model for Sentinel-2 time series classification
AU - Yuan, Yuan
AU - Lin, Lei
AU - Liu, Qingshan
AU - Hang, Renlong
AU - Zhou, Zeng Guang
N1 - Publisher Copyright:
© 2021 The Authors
PY - 2022/2
Y1 - 2022/2
N2 - Sentinel-2 images provide a rich source of information for a variety of land cover, vegetation, and environmental monitoring applications due to their high spectral, spatial, and temporal resolutions. Recently, deep learning-based classification of Sentinel-2 time series becomes a popular solution to vegetation classification and land cover mapping, but it often demands a large number of manually annotated labels. Improving classification performance with limited labeled data is still a challenge in many real-world remote sensing applications. To address label scarcity, we present SITS-Former (SITS stands for Satellite Image Time Series and Former stands for Transformer), a pre-trained representation model for Sentinel-2 time series classification. SITS-Former adopts a Transformer encoder as the backbone and takes time series of image patches as input to learn spatio-spectral-temporal features. According to the principles of self-supervised learning, we pre-train SITS-Former on massive unlabeled Sentinel-2 time series via a missing-data imputation proxy task. Given an incomplete time series with some patches being masked randomly, the network is asked to regress the central pixels of these masked patches based on the residual ones. By doing so, the network can capture high-level spatial and temporal dependencies from the data to learn discriminative features. After pre-training, the network can adapt the learned features to a target classification task through fine-tuning. As far as we know, this is the first study that exploits self-supervised learning for patch-based representation learning and classification of SITS. We quantitatively evaluate the quality of the learned features by transferring them on two crop classification tasks, showing that SITS-Former outperforms state-of-the-art approaches and yields a significant improvement (2.64%∼3.30% in overall accuracy) over the purely supervised model. The proposed model provides an effective tool for SITS-related applications as it greatly reduces the burden of manual labeling. The source code will be released at https://github.com/linlei1214/SITS-Former upon publication.
AB - Sentinel-2 images provide a rich source of information for a variety of land cover, vegetation, and environmental monitoring applications due to their high spectral, spatial, and temporal resolutions. Recently, deep learning-based classification of Sentinel-2 time series becomes a popular solution to vegetation classification and land cover mapping, but it often demands a large number of manually annotated labels. Improving classification performance with limited labeled data is still a challenge in many real-world remote sensing applications. To address label scarcity, we present SITS-Former (SITS stands for Satellite Image Time Series and Former stands for Transformer), a pre-trained representation model for Sentinel-2 time series classification. SITS-Former adopts a Transformer encoder as the backbone and takes time series of image patches as input to learn spatio-spectral-temporal features. According to the principles of self-supervised learning, we pre-train SITS-Former on massive unlabeled Sentinel-2 time series via a missing-data imputation proxy task. Given an incomplete time series with some patches being masked randomly, the network is asked to regress the central pixels of these masked patches based on the residual ones. By doing so, the network can capture high-level spatial and temporal dependencies from the data to learn discriminative features. After pre-training, the network can adapt the learned features to a target classification task through fine-tuning. As far as we know, this is the first study that exploits self-supervised learning for patch-based representation learning and classification of SITS. We quantitatively evaluate the quality of the learned features by transferring them on two crop classification tasks, showing that SITS-Former outperforms state-of-the-art approaches and yields a significant improvement (2.64%∼3.30% in overall accuracy) over the purely supervised model. The proposed model provides an effective tool for SITS-related applications as it greatly reduces the burden of manual labeling. The source code will be released at https://github.com/linlei1214/SITS-Former upon publication.
KW - Pre-training
KW - Self-supervised learning
KW - Sentinel-2
KW - Transformer
KW - satellite image time series (SITS)
UR - http://www.scopus.com/inward/record.url?scp=85122543841&partnerID=8YFLogxK
U2 - 10.1016/j.jag.2021.102651
DO - 10.1016/j.jag.2021.102651
M3 - 文章
AN - SCOPUS:85122543841
SN - 1569-8432
VL - 106
JO - International Journal of Applied Earth Observation and Geoinformation
JF - International Journal of Applied Earth Observation and Geoinformation
M1 - 102651
ER -