TY - JOUR
T1 - Adaptive Discrepancy Masked Distillation for remote sensing object detection
AU - Li, Cong
AU - Cheng, Gong
AU - Han, Junwei
N1 - Publisher Copyright:
© 2025 International Society for Photogrammetry and Remote Sensing, Inc. (ISPRS)
PY - 2025/4
Y1 - 2025/4
N2 - Knowledge distillation (KD) has become a promising technique for obtaining a performant student detector in remote sensing images by inheriting the knowledge from a heavy teacher detector. Unfortunately, not every pixel contributes (even detrimental) equally to the final KD performance. To dispel this problem, the existing methods usually derived a distillation mask to stress the valuable regions during KD. In this paper, we put forth Adaptive Discrepancy Masked Distillation (ADMD), a novel KD framework to explicitly localize the beneficial pixels. Our approach stems from the observation that the feature discrepancy between the teacher and student is the essential reason for their performance gap. With this regard, we make use of the feature discrepancy to determine which location causes the student to lag behind the teacher and then regulate the student to assign higher learning priority to them. Furthermore, we empirically observe that the discrepancy masked distillation leads to loss vanishing in later KD stages. To combat this issue, we introduce a simple yet practical weight-increasing module, in which the magnitude of KD loss is adaptively adjusted to ensure KD steadily contributes to student optimization. Comprehensive experiments on DIOR and DOTA across various dense detectors show that our ADMD consistently harvests remarkable performance gains, particularly under a prolonged distillation schedule, and exhibits superiority over state-of-the-art counterparts. Code and trained checkpoints will be made available at https://github.com/swift1988.
AB - Knowledge distillation (KD) has become a promising technique for obtaining a performant student detector in remote sensing images by inheriting the knowledge from a heavy teacher detector. Unfortunately, not every pixel contributes (even detrimental) equally to the final KD performance. To dispel this problem, the existing methods usually derived a distillation mask to stress the valuable regions during KD. In this paper, we put forth Adaptive Discrepancy Masked Distillation (ADMD), a novel KD framework to explicitly localize the beneficial pixels. Our approach stems from the observation that the feature discrepancy between the teacher and student is the essential reason for their performance gap. With this regard, we make use of the feature discrepancy to determine which location causes the student to lag behind the teacher and then regulate the student to assign higher learning priority to them. Furthermore, we empirically observe that the discrepancy masked distillation leads to loss vanishing in later KD stages. To combat this issue, we introduce a simple yet practical weight-increasing module, in which the magnitude of KD loss is adaptively adjusted to ensure KD steadily contributes to student optimization. Comprehensive experiments on DIOR and DOTA across various dense detectors show that our ADMD consistently harvests remarkable performance gains, particularly under a prolonged distillation schedule, and exhibits superiority over state-of-the-art counterparts. Code and trained checkpoints will be made available at https://github.com/swift1988.
KW - Knowledge distillation
KW - Object detection
KW - Remote sensing images
UR - http://www.scopus.com/inward/record.url?scp=85218465316&partnerID=8YFLogxK
U2 - 10.1016/j.isprsjprs.2025.02.006
DO - 10.1016/j.isprsjprs.2025.02.006
M3 - 文章
AN - SCOPUS:85218465316
SN - 0924-2716
VL - 222
SP - 54
EP - 63
JO - ISPRS Journal of Photogrammetry and Remote Sensing
JF - ISPRS Journal of Photogrammetry and Remote Sensing
ER -