TY - JOUR
T1 - LaTP
T2 - LiDAR-aided multimodal token pruning for efficient trajectory prediction of autonomous driving
AU - Lu, Yantao
AU - Sun, Shiqi
AU - Liu, Ning
AU - Jiang, Bo
AU - Li, Yilan
AU - Chen, Jinchao
AU - Zhang, Ying
AU - Zhu, Yichen
AU - Velipasalar, Senem
N1 - Publisher Copyright:
© 2025 Elsevier Ltd
PY - 2025/10
Y1 - 2025/10
N2 - The rapid advancement of Large Vision Language Models (LVLMs) has spurred significant progress in autonomous driving, especially in end-to-end trajectory prediction, which is crucial for enabling autonomous driving across diverse traffic scenarios. Nevertheless, the onboard computational requirements of autonomous vehicles present challenges for deploying LVLMs on resource-constrained devices, as they demand substantial processing power. Token pruning is one of the most promising approach that achieves considerable inference speed gains without requiring additional model training. While token pruning has demonstrated its efficacy in various domains, it appears that the current approaches are designed for generalized tasks and have not been tailored to address the unique demands of trajectory prediction in autonomous driving. Specifically, within the context of trajectory prediction of autonomous driving, there are two considerations that have not been adequately addressed: (i) content information, where irrelevant visual elements, despite their complex features, cannot be pruned effectively due to their non-trivial appearance; (ii) distance information, which is critical for accurate trajectory prediction but often overlooked by conventional pruning approaches. As a result, directly applying existing pruning methods to LVLMs without considering these crucial differences may lead to a degradation in performance. To overcome these challenges, we propose a novel token pruning method, LiDAR-aided Token Prune (LaTP), specifically designed for LVLM-based trajectory prediction in autonomous driving. LaTP efficiently integrates LiDAR points to provide distance information for camera inputs and uses a content- and distance-aware token importance indicator to discard visual tokens that are inconsequential for driving. This approach significantly improves inference speed without compromising control accuracy. Experiments on the nuScenes dataset validate the effectiveness of our method, showing superior performance compared to general token pruning baselines. Specifically, LaTP achieves a pruning ratio of up to 75% while maintaining an Average Displacement Error (ADE) of 2.03 meters and a Collision Rate (col.) of 2.35%, demonstrating its ability to significantly reduce computational load without sacrificing prediction accuracy.
AB - The rapid advancement of Large Vision Language Models (LVLMs) has spurred significant progress in autonomous driving, especially in end-to-end trajectory prediction, which is crucial for enabling autonomous driving across diverse traffic scenarios. Nevertheless, the onboard computational requirements of autonomous vehicles present challenges for deploying LVLMs on resource-constrained devices, as they demand substantial processing power. Token pruning is one of the most promising approach that achieves considerable inference speed gains without requiring additional model training. While token pruning has demonstrated its efficacy in various domains, it appears that the current approaches are designed for generalized tasks and have not been tailored to address the unique demands of trajectory prediction in autonomous driving. Specifically, within the context of trajectory prediction of autonomous driving, there are two considerations that have not been adequately addressed: (i) content information, where irrelevant visual elements, despite their complex features, cannot be pruned effectively due to their non-trivial appearance; (ii) distance information, which is critical for accurate trajectory prediction but often overlooked by conventional pruning approaches. As a result, directly applying existing pruning methods to LVLMs without considering these crucial differences may lead to a degradation in performance. To overcome these challenges, we propose a novel token pruning method, LiDAR-aided Token Prune (LaTP), specifically designed for LVLM-based trajectory prediction in autonomous driving. LaTP efficiently integrates LiDAR points to provide distance information for camera inputs and uses a content- and distance-aware token importance indicator to discard visual tokens that are inconsequential for driving. This approach significantly improves inference speed without compromising control accuracy. Experiments on the nuScenes dataset validate the effectiveness of our method, showing superior performance compared to general token pruning baselines. Specifically, LaTP achieves a pruning ratio of up to 75% while maintaining an Average Displacement Error (ADE) of 2.03 meters and a Collision Rate (col.) of 2.35%, demonstrating its ability to significantly reduce computational load without sacrificing prediction accuracy.
KW - Autonomous driving
KW - Large vision–language model
KW - Multimodal fusion
KW - Transformer token pruning
UR - http://www.scopus.com/inward/record.url?scp=105007800417&partnerID=8YFLogxK
U2 - 10.1016/j.neunet.2025.107673
DO - 10.1016/j.neunet.2025.107673
M3 - 文章
AN - SCOPUS:105007800417
SN - 0893-6080
VL - 190
JO - Neural Networks
JF - Neural Networks
M1 - 107673
ER -