TY - JOUR
T1 - SL-Seg
T2 - A CNN-Transformer Fusion Network for Road Surface and Lane Segmentation in Complex Scenarios
AU - Meng, Chenlin
AU - Wang, Xin
AU - Tu, Qinhao
AU - Mao, Zhaoyong
AU - Shen, Junge
N1 - Publisher Copyright:
© 2000-2011 IEEE.
PY - 2025
Y1 - 2025
N2 - Road image segmentation plays a pivotal role in traffic video surveillance for environmental perception. Precise segmentation of roads and lanes is essential for effective traffic monitoring and management. However, unlike the perspective encountered in autonomous driving, the surveillance perspective poses unique challenges due to its wider scope and susceptibility to complex environments. This complexity makes the segmentation task in road surveillance videos particularly demanding. To overcome these challenges, we introduce an end-to-end semantic segmentation network that leverages a CNN-Transformer architecture. Firstly, a spatial pyramid attention-style convolution (SP-AttnConv) module, built upon the Transformer is introduced, to ensure accurate segmentation across long distances while preserving fine boundary information. This module enhances local information and fosters a “global-local” feature fusion framework. Secondly, to tackle the issue of scale imbalance during segmentation, a lightweight multi-scale (LMS) module is introduced to capture multi-scale feature. Additionally, an occlusion relief branch (ORB) module is integrated into the decoder, specifically addressing occlusions caused by irrelevant objects. Recognizing the need for a dedicated benchmark dataset for road surface and lane segmentation, surface-lane (SL) for complex scenarios is built in our paper to promote the development of traffic surveillance system. Comparative experiments demonstrate that our method achieves the best overall performance on the SL dataset.
AB - Road image segmentation plays a pivotal role in traffic video surveillance for environmental perception. Precise segmentation of roads and lanes is essential for effective traffic monitoring and management. However, unlike the perspective encountered in autonomous driving, the surveillance perspective poses unique challenges due to its wider scope and susceptibility to complex environments. This complexity makes the segmentation task in road surveillance videos particularly demanding. To overcome these challenges, we introduce an end-to-end semantic segmentation network that leverages a CNN-Transformer architecture. Firstly, a spatial pyramid attention-style convolution (SP-AttnConv) module, built upon the Transformer is introduced, to ensure accurate segmentation across long distances while preserving fine boundary information. This module enhances local information and fosters a “global-local” feature fusion framework. Secondly, to tackle the issue of scale imbalance during segmentation, a lightweight multi-scale (LMS) module is introduced to capture multi-scale feature. Additionally, an occlusion relief branch (ORB) module is integrated into the decoder, specifically addressing occlusions caused by irrelevant objects. Recognizing the need for a dedicated benchmark dataset for road surface and lane segmentation, surface-lane (SL) for complex scenarios is built in our paper to promote the development of traffic surveillance system. Comparative experiments demonstrate that our method achieves the best overall performance on the SL dataset.
KW - computer vision
KW - intelligent traffic systems
KW - Road segmentation
KW - segmentation dataset
UR - https://www.scopus.com/pages/publications/105019094749
U2 - 10.1109/TITS.2025.3615568
DO - 10.1109/TITS.2025.3615568
M3 - 文章
AN - SCOPUS:105019094749
SN - 1524-9050
JO - IEEE Transactions on Intelligent Transportation Systems
JF - IEEE Transactions on Intelligent Transportation Systems
ER -