TY - JOUR
T1 - Enhancing Unimodal Features Matters
T2 - A Multimodal Framework for Building Extraction
AU - Shi, Xiaofeng
AU - Gao, Junyu
AU - Yuan, Yuan
N1 - Publisher Copyright:
© 1980-2012 IEEE.
PY - 2024
Y1 - 2024
N2 - In recent years, deep learning and multimodal data have substantially propelled the development of building extraction models. However, prevailing multimodal methods are difficult to cope with two challenges: 1) modal laziness: the training error is minimized before the model has learned extensive unimodal patterns and 2) modal imbalance: the backpropagation process is easily dominated by a certain modality. As a result, the unimodal features learning is insufficient, leading to limited performance of the model when dealing with the intricate foreground and background contexts surrounding the buildings. In this article, we deal with this problem from the perspective of algorithm and model evaluation. At the algorithmic level, we propose a unimodal feature enhancement (UFE) framework. Specifically, UFE is model-agnostic, comprising two distinct components: adaptive gradient enhancement (AGE) for modal laziness and consistency constraint loss (CCL) for modal imbalance. AGE dynamically modulates the original gradient by monitoring the representation effects of unimodal features and multimodal fusion features. CCL imposes mutual constraints on diverse modal branches at the semantic level to reconcile the optimization process. At the model evaluation level, a new metric, named unimodal utilization ratio (UUR), is presented to assess models through the learning efficacy of unimodal features. The experimental results including the variants of UUR on two building extraction datasets demonstrate a substantial performance improvement by UFE. Moreover, UFE also exhibits its adaptability when integrated with various model components and its generalization on other multimodal image-related tasks.
AB - In recent years, deep learning and multimodal data have substantially propelled the development of building extraction models. However, prevailing multimodal methods are difficult to cope with two challenges: 1) modal laziness: the training error is minimized before the model has learned extensive unimodal patterns and 2) modal imbalance: the backpropagation process is easily dominated by a certain modality. As a result, the unimodal features learning is insufficient, leading to limited performance of the model when dealing with the intricate foreground and background contexts surrounding the buildings. In this article, we deal with this problem from the perspective of algorithm and model evaluation. At the algorithmic level, we propose a unimodal feature enhancement (UFE) framework. Specifically, UFE is model-agnostic, comprising two distinct components: adaptive gradient enhancement (AGE) for modal laziness and consistency constraint loss (CCL) for modal imbalance. AGE dynamically modulates the original gradient by monitoring the representation effects of unimodal features and multimodal fusion features. CCL imposes mutual constraints on diverse modal branches at the semantic level to reconcile the optimization process. At the model evaluation level, a new metric, named unimodal utilization ratio (UUR), is presented to assess models through the learning efficacy of unimodal features. The experimental results including the variants of UUR on two building extraction datasets demonstrate a substantial performance improvement by UFE. Moreover, UFE also exhibits its adaptability when integrated with various model components and its generalization on other multimodal image-related tasks.
KW - Building extraction
KW - modal imbalance
KW - modal laziness
KW - multimodal fusion
KW - unimodal feature enhancement (UFE)
UR - http://www.scopus.com/inward/record.url?scp=85191347237&partnerID=8YFLogxK
U2 - 10.1109/TGRS.2024.3392631
DO - 10.1109/TGRS.2024.3392631
M3 - 文章
AN - SCOPUS:85191347237
SN - 0196-2892
VL - 62
SP - 1
EP - 13
JO - IEEE Transactions on Geoscience and Remote Sensing
JF - IEEE Transactions on Geoscience and Remote Sensing
M1 - 5622013
ER -