TY - JOUR
T1 - Rethinking Training Strategy in Stereo Matching
AU - Rao, Zhibo
AU - Dai, Yuchao
AU - Shen, Zhelun
AU - He, Renjie
N1 - Publisher Copyright:
© 2012 IEEE.
PY - 2023/10/1
Y1 - 2023/10/1
N2 - In stereo matching, various learning-based approaches have shown impressive performance in solving traditional difficulties on multiple datasets. While most progress is obtained on a specific dataset with a dataset-specific network design, the performance on the single dataset and cross dataset affected by training strategy is often ignored. In this article, we analyze the relationship between different training strategies and performance by retraining some representative state-of-the-art methods (e.g., geometry and context network (GC-Net), pyramid stereo matching network (PSM-Net), and guided aggregation network (GA-Net), etc.). According to our research, it is surprising that the performance of networks on single or cross datasets is significantly improved by pre-training and data augmentation without any particular structure acquirement. Based on this discovery, we improve our previous non-local context attention network (NLCA-Net) to NLCA-Net v2 and train it with the novel strategy and rethink the training strategy of stereo matching concurrently. The quantitative experiments demonstrate that: 1) our model is capable of reaching top performance on both the single dataset and the multiple datasets with the same parameters in this study, which also won the 2nd place in the stereo task of the ECCV Robust vision Challenge 2020 (RVC 2020); and 2) on small datasets (e.g., KITTI, ETH3D, and Middlebury), the model's generalization and robustness are significantly affected by pre-training and data augmentation, even exceeding the network structure's influence in some cases. These observations present a challenge to the conventional wisdom of network architectures in this stage. We expect these discoveries to encourage researchers to rethink the current paradigm of 'excessive attention on the performance of a single small dataset' in stereo matching.
AB - In stereo matching, various learning-based approaches have shown impressive performance in solving traditional difficulties on multiple datasets. While most progress is obtained on a specific dataset with a dataset-specific network design, the performance on the single dataset and cross dataset affected by training strategy is often ignored. In this article, we analyze the relationship between different training strategies and performance by retraining some representative state-of-the-art methods (e.g., geometry and context network (GC-Net), pyramid stereo matching network (PSM-Net), and guided aggregation network (GA-Net), etc.). According to our research, it is surprising that the performance of networks on single or cross datasets is significantly improved by pre-training and data augmentation without any particular structure acquirement. Based on this discovery, we improve our previous non-local context attention network (NLCA-Net) to NLCA-Net v2 and train it with the novel strategy and rethink the training strategy of stereo matching concurrently. The quantitative experiments demonstrate that: 1) our model is capable of reaching top performance on both the single dataset and the multiple datasets with the same parameters in this study, which also won the 2nd place in the stereo task of the ECCV Robust vision Challenge 2020 (RVC 2020); and 2) on small datasets (e.g., KITTI, ETH3D, and Middlebury), the model's generalization and robustness are significantly affected by pre-training and data augmentation, even exceeding the network structure's influence in some cases. These observations present a challenge to the conventional wisdom of network architectures in this stage. We expect these discoveries to encourage researchers to rethink the current paradigm of 'excessive attention on the performance of a single small dataset' in stereo matching.
KW - Data augmentations
KW - pre-training
KW - robust vision challenge
KW - stereo matching
UR - http://www.scopus.com/inward/record.url?scp=85124738729&partnerID=8YFLogxK
U2 - 10.1109/TNNLS.2022.3146306
DO - 10.1109/TNNLS.2022.3146306
M3 - 文章
C2 - 35143404
AN - SCOPUS:85124738729
SN - 2162-237X
VL - 34
SP - 7796
EP - 7809
JO - IEEE Transactions on Neural Networks and Learning Systems
JF - IEEE Transactions on Neural Networks and Learning Systems
IS - 10
ER -