LaTP: LiDAR-aided multimodal token pruning for efficient trajectory prediction of autonomous driving

Yantao Lu; Shiqi Sun; Ning Liu; Bo Jiang; Yilan Li; Jinchao Chen; Ying Zhang; Yichen Zhu; Senem Velipasalar

doi:10.1016/j.neunet.2025.107673

LaTP: LiDAR-aided multimodal token pruning for efficient trajectory prediction of autonomous driving

Yantao Lu, Shiqi Sun, Ning Liu, Bo Jiang, Yilan Li, Jinchao Chen, Ying Zhang, Yichen Zhu, Senem Velipasalar

计算机学院

科研成果: 期刊稿件 › 文章 › 同行评审

摘要

The rapid advancement of Large Vision Language Models (LVLMs) has spurred significant progress in autonomous driving, especially in end-to-end trajectory prediction, which is crucial for enabling autonomous driving across diverse traffic scenarios. Nevertheless, the onboard computational requirements of autonomous vehicles present challenges for deploying LVLMs on resource-constrained devices, as they demand substantial processing power. Token pruning is one of the most promising approach that achieves considerable inference speed gains without requiring additional model training. While token pruning has demonstrated its efficacy in various domains, it appears that the current approaches are designed for generalized tasks and have not been tailored to address the unique demands of trajectory prediction in autonomous driving. Specifically, within the context of trajectory prediction of autonomous driving, there are two considerations that have not been adequately addressed: (i) content information, where irrelevant visual elements, despite their complex features, cannot be pruned effectively due to their non-trivial appearance; (ii) distance information, which is critical for accurate trajectory prediction but often overlooked by conventional pruning approaches. As a result, directly applying existing pruning methods to LVLMs without considering these crucial differences may lead to a degradation in performance. To overcome these challenges, we propose a novel token pruning method, LiDAR-aided Token Prune (LaTP), specifically designed for LVLM-based trajectory prediction in autonomous driving. LaTP efficiently integrates LiDAR points to provide distance information for camera inputs and uses a content- and distance-aware token importance indicator to discard visual tokens that are inconsequential for driving. This approach significantly improves inference speed without compromising control accuracy. Experiments on the nuScenes dataset validate the effectiveness of our method, showing superior performance compared to general token pruning baselines. Specifically, LaTP achieves a pruning ratio of up to 75% while maintaining an Average Displacement Error (ADE) of 2.03 meters and a Collision Rate (col.) of 2.35%, demonstrating its ability to significantly reduce computational load without sacrificing prediction accuracy.

源语言	英语
文章编号	107673
期刊	Neural Networks
卷	190
DOI	https://doi.org/10.1016/j.neunet.2025.107673
出版状态	已出版 - 10月 2025

访问文件

10.1016/j.neunet.2025.107673

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{3bd49bbb2acb456a9f11233973a57548,

title = "LaTP: LiDAR-aided multimodal token pruning for efficient trajectory prediction of autonomous driving",

abstract = "The rapid advancement of Large Vision Language Models (LVLMs) has spurred significant progress in autonomous driving, especially in end-to-end trajectory prediction, which is crucial for enabling autonomous driving across diverse traffic scenarios. Nevertheless, the onboard computational requirements of autonomous vehicles present challenges for deploying LVLMs on resource-constrained devices, as they demand substantial processing power. Token pruning is one of the most promising approach that achieves considerable inference speed gains without requiring additional model training. While token pruning has demonstrated its efficacy in various domains, it appears that the current approaches are designed for generalized tasks and have not been tailored to address the unique demands of trajectory prediction in autonomous driving. Specifically, within the context of trajectory prediction of autonomous driving, there are two considerations that have not been adequately addressed: (i) content information, where irrelevant visual elements, despite their complex features, cannot be pruned effectively due to their non-trivial appearance; (ii) distance information, which is critical for accurate trajectory prediction but often overlooked by conventional pruning approaches. As a result, directly applying existing pruning methods to LVLMs without considering these crucial differences may lead to a degradation in performance. To overcome these challenges, we propose a novel token pruning method, LiDAR-aided Token Prune (LaTP), specifically designed for LVLM-based trajectory prediction in autonomous driving. LaTP efficiently integrates LiDAR points to provide distance information for camera inputs and uses a content- and distance-aware token importance indicator to discard visual tokens that are inconsequential for driving. This approach significantly improves inference speed without compromising control accuracy. Experiments on the nuScenes dataset validate the effectiveness of our method, showing superior performance compared to general token pruning baselines. Specifically, LaTP achieves a pruning ratio of up to 75% while maintaining an Average Displacement Error (ADE) of 2.03 meters and a Collision Rate (col.) of 2.35%, demonstrating its ability to significantly reduce computational load without sacrificing prediction accuracy.",

keywords = "Autonomous driving, Large vision–language model, Multimodal fusion, Transformer token pruning",

author = "Yantao Lu and Shiqi Sun and Ning Liu and Bo Jiang and Yilan Li and Jinchao Chen and Ying Zhang and Yichen Zhu and Senem Velipasalar",

note = "Publisher Copyright: {\textcopyright} 2025 Elsevier Ltd",

year = "2025",

month = oct,

doi = "10.1016/j.neunet.2025.107673",

language = "英语",

volume = "190",

journal = "Neural Networks",

issn = "0893-6080",

publisher = "Elsevier Ltd",

}

TY - JOUR

T1 - LaTP

T2 - LiDAR-aided multimodal token pruning for efficient trajectory prediction of autonomous driving

AU - Lu, Yantao

AU - Sun, Shiqi

AU - Liu, Ning

AU - Jiang, Bo

AU - Li, Yilan

AU - Chen, Jinchao

AU - Zhang, Ying

AU - Zhu, Yichen

AU - Velipasalar, Senem

PY - 2025/10

Y1 - 2025/10

N2 - The rapid advancement of Large Vision Language Models (LVLMs) has spurred significant progress in autonomous driving, especially in end-to-end trajectory prediction, which is crucial for enabling autonomous driving across diverse traffic scenarios. Nevertheless, the onboard computational requirements of autonomous vehicles present challenges for deploying LVLMs on resource-constrained devices, as they demand substantial processing power. Token pruning is one of the most promising approach that achieves considerable inference speed gains without requiring additional model training. While token pruning has demonstrated its efficacy in various domains, it appears that the current approaches are designed for generalized tasks and have not been tailored to address the unique demands of trajectory prediction in autonomous driving. Specifically, within the context of trajectory prediction of autonomous driving, there are two considerations that have not been adequately addressed: (i) content information, where irrelevant visual elements, despite their complex features, cannot be pruned effectively due to their non-trivial appearance; (ii) distance information, which is critical for accurate trajectory prediction but often overlooked by conventional pruning approaches. As a result, directly applying existing pruning methods to LVLMs without considering these crucial differences may lead to a degradation in performance. To overcome these challenges, we propose a novel token pruning method, LiDAR-aided Token Prune (LaTP), specifically designed for LVLM-based trajectory prediction in autonomous driving. LaTP efficiently integrates LiDAR points to provide distance information for camera inputs and uses a content- and distance-aware token importance indicator to discard visual tokens that are inconsequential for driving. This approach significantly improves inference speed without compromising control accuracy. Experiments on the nuScenes dataset validate the effectiveness of our method, showing superior performance compared to general token pruning baselines. Specifically, LaTP achieves a pruning ratio of up to 75% while maintaining an Average Displacement Error (ADE) of 2.03 meters and a Collision Rate (col.) of 2.35%, demonstrating its ability to significantly reduce computational load without sacrificing prediction accuracy.

AB - The rapid advancement of Large Vision Language Models (LVLMs) has spurred significant progress in autonomous driving, especially in end-to-end trajectory prediction, which is crucial for enabling autonomous driving across diverse traffic scenarios. Nevertheless, the onboard computational requirements of autonomous vehicles present challenges for deploying LVLMs on resource-constrained devices, as they demand substantial processing power. Token pruning is one of the most promising approach that achieves considerable inference speed gains without requiring additional model training. While token pruning has demonstrated its efficacy in various domains, it appears that the current approaches are designed for generalized tasks and have not been tailored to address the unique demands of trajectory prediction in autonomous driving. Specifically, within the context of trajectory prediction of autonomous driving, there are two considerations that have not been adequately addressed: (i) content information, where irrelevant visual elements, despite their complex features, cannot be pruned effectively due to their non-trivial appearance; (ii) distance information, which is critical for accurate trajectory prediction but often overlooked by conventional pruning approaches. As a result, directly applying existing pruning methods to LVLMs without considering these crucial differences may lead to a degradation in performance. To overcome these challenges, we propose a novel token pruning method, LiDAR-aided Token Prune (LaTP), specifically designed for LVLM-based trajectory prediction in autonomous driving. LaTP efficiently integrates LiDAR points to provide distance information for camera inputs and uses a content- and distance-aware token importance indicator to discard visual tokens that are inconsequential for driving. This approach significantly improves inference speed without compromising control accuracy. Experiments on the nuScenes dataset validate the effectiveness of our method, showing superior performance compared to general token pruning baselines. Specifically, LaTP achieves a pruning ratio of up to 75% while maintaining an Average Displacement Error (ADE) of 2.03 meters and a Collision Rate (col.) of 2.35%, demonstrating its ability to significantly reduce computational load without sacrificing prediction accuracy.

KW - Autonomous driving

KW - Large vision–language model

KW - Multimodal fusion

KW - Transformer token pruning

UR - http://www.scopus.com/inward/record.url?scp=105007800417&partnerID=8YFLogxK

U2 - 10.1016/j.neunet.2025.107673

DO - 10.1016/j.neunet.2025.107673

M3 - 文章

AN - SCOPUS:105007800417

SN - 0893-6080

VL - 190

JO - Neural Networks

JF - Neural Networks

M1 - 107673

ER -