Abstract
Event-based semantic segmentation has gained popularity due to its capability to deal with scenarios under high-speed motion and extreme lighting conditions, which cannot be addressed by conventional RGB cameras. Since it is hard to annotate event data, previous approaches rely on event-to-image reconstruction to obtain pseudo labels for training. However, this will inevitably introduce noise, and learning from noisy pseudo labels, especially when generated from a single source, may reinforce the errors. This drawback is also called confirmation bias in pseudo-labeling. In this paper, we propose a novel hybrid pseudo-labeling framework for unsupervised event-based semantic segmentation, HPL-ESS, to alleviate the influence of noisy pseudo labels. Specifically, we first employ a plain unsupervised domain adaptation framework as our baseline, which can generate a set of pseudo labels through self-training. Then, we incorporate offline event-to-image re-construction into the framework, and obtain another set of pseudo labels by predicting segmentation maps on the re-constructed images. A noisy label learning strategy is designed to mix the two sets of pseudo labels and enhance the quality. Moreover, we propose a soft prototypical alignment (SPA) module to further improve the consistency of target domain features. Extensive experiments show that the proposed method outperforms existing state-of-the-art methods by a large margin on benchmarks (e.g., +5.88% accuracy, +10.32% mIoU on DSEC-Semantic dataset), and even surpasses several supervised methods.
Original language | English |
---|---|
Pages (from-to) | 23128-23137 |
Number of pages | 10 |
Journal | Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition |
DOIs | |
State | Published - 2024 |
Event | 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024 - Seattle, United States Duration: 16 Jun 2024 → 22 Jun 2024 |
Keywords
- Event Camera
- Segmentation
- Unsupervised Representation Learning