LaTP: LiDAR-aided multimodal token pruning for efficient trajectory prediction of autonomous driving

Yantao Lu, Shiqi Sun, Ning Liu, Bo Jiang, Yilan Li, Jinchao Chen, Ying Zhang, Yichen Zhu, Senem Velipasalar

科研成果: 期刊稿件文章同行评审

摘要

The rapid advancement of Large Vision Language Models (LVLMs) has spurred significant progress in autonomous driving, especially in end-to-end trajectory prediction, which is crucial for enabling autonomous driving across diverse traffic scenarios. Nevertheless, the onboard computational requirements of autonomous vehicles present challenges for deploying LVLMs on resource-constrained devices, as they demand substantial processing power. Token pruning is one of the most promising approach that achieves considerable inference speed gains without requiring additional model training. While token pruning has demonstrated its efficacy in various domains, it appears that the current approaches are designed for generalized tasks and have not been tailored to address the unique demands of trajectory prediction in autonomous driving. Specifically, within the context of trajectory prediction of autonomous driving, there are two considerations that have not been adequately addressed: (i) content information, where irrelevant visual elements, despite their complex features, cannot be pruned effectively due to their non-trivial appearance; (ii) distance information, which is critical for accurate trajectory prediction but often overlooked by conventional pruning approaches. As a result, directly applying existing pruning methods to LVLMs without considering these crucial differences may lead to a degradation in performance. To overcome these challenges, we propose a novel token pruning method, LiDAR-aided Token Prune (LaTP), specifically designed for LVLM-based trajectory prediction in autonomous driving. LaTP efficiently integrates LiDAR points to provide distance information for camera inputs and uses a content- and distance-aware token importance indicator to discard visual tokens that are inconsequential for driving. This approach significantly improves inference speed without compromising control accuracy. Experiments on the nuScenes dataset validate the effectiveness of our method, showing superior performance compared to general token pruning baselines. Specifically, LaTP achieves a pruning ratio of up to 75% while maintaining an Average Displacement Error (ADE) of 2.03 meters and a Collision Rate (col.) of 2.35%, demonstrating its ability to significantly reduce computational load without sacrificing prediction accuracy.

源语言英语
文章编号107673
期刊Neural Networks
190
DOI
出版状态已出版 - 10月 2025

引用此