Spatio-Temporal Synergy with ViT: Enhancing Collaborative Perception and Object Detection for Heterogeneous Agents

Yuan Gao; Sicong Liu; Xiangrui Xu; Zhiyang Ding; Bin Guo; Zhiwen Yu

doi:10.1145/3698383.3699621

Spatio-Temporal Synergy with ViT: Enhancing Collaborative Perception and Object Detection for Heterogeneous Agents

Yuan Gao, Sicong Liu, Xiangrui Xu, Zhiyang Ding, Bin Guo, Zhiwen Yu

计算机学院

Northwestern Polytechnical University Xian

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

摘要

To address the limitations of traditional heterogeneous agent cooperative sensing methods in terms of feedback latency and spatiotemporal dependencies, this paper proposes a heterogeneous agents enhancing cooperative perception and object detection system. The system is based on the Vision Transformer (ViT) model, leveraging its superior global context awareness and multimodal data fusion capabilities. Additionally, it incorporates the proposed adaptive delay position sensing module and spatiotemporal dependency dynamic modeling module, effectively resolving issues related to data transmission latency and complex spatiotemporal dependencies between agents. This significantly enhances the accuracy and timeliness of heterogeneous multi-agent collaborative sensing systems.

源语言	英语
主期刊名	RMELS 2024 - Proceedings of the 1st ACM International Workshop on Resource-efficient Mobile and Embedded LLM System in AIoT, Part of
主期刊副标题	ACM Sensys 2024
出版商	Association for Computing Machinery, Inc
页	3-5
页数	3
ISBN（电子版）	9798400712951
DOI	https://doi.org/10.1145/3698383.3699621
出版状态	已出版 - 4 11月 2024
活动	1st ACM International Workshop on Resource-efficient Mobile and Embedded LLM System in AIoT, RMELS 2024 - Hangzhou, 中国期限: 4 11月 2024 → …

出版系列

姓名	RMELS 2024 - Proceedings of the 1st ACM International Workshop on Resource-efficient Mobile and Embedded LLM System in AIoT, Part of: ACM Sensys 2024

会议

会议	1st ACM International Workshop on Resource-efficient Mobile and Embedded LLM System in AIoT, RMELS 2024
国家/地区	中国
市	Hangzhou
时期	4/11/24 → …

访问文件

10.1145/3698383.3699621

其它文件与链接

链接到 Scopus 的出版物

引用此

Gao, Y., Liu, S., Xu, X., Ding, Z., Guo, B., & Yu, Z. (2024). Spatio-Temporal Synergy with ViT: Enhancing Collaborative Perception and Object Detection for Heterogeneous Agents. 在 RMELS 2024 - Proceedings of the 1st ACM International Workshop on Resource-efficient Mobile and Embedded LLM System in AIoT, Part of: ACM Sensys 2024 (页码 3-5). (RMELS 2024 - Proceedings of the 1st ACM International Workshop on Resource-efficient Mobile and Embedded LLM System in AIoT, Part of: ACM Sensys 2024). Association for Computing Machinery, Inc. https://doi.org/10.1145/3698383.3699621

Gao, Yuan ; Liu, Sicong ; Xu, Xiangrui 等. / Spatio-Temporal Synergy with ViT : Enhancing Collaborative Perception and Object Detection for Heterogeneous Agents. RMELS 2024 - Proceedings of the 1st ACM International Workshop on Resource-efficient Mobile and Embedded LLM System in AIoT, Part of: ACM Sensys 2024. Association for Computing Machinery, Inc, 2024. 页码 3-5 (RMELS 2024 - Proceedings of the 1st ACM International Workshop on Resource-efficient Mobile and Embedded LLM System in AIoT, Part of: ACM Sensys 2024).

@inproceedings{406ef29da7a64351820945ba960a0f50,

title = "Spatio-Temporal Synergy with ViT: Enhancing Collaborative Perception and Object Detection for Heterogeneous Agents",

abstract = "To address the limitations of traditional heterogeneous agent cooperative sensing methods in terms of feedback latency and spatiotemporal dependencies, this paper proposes a heterogeneous agents enhancing cooperative perception and object detection system. The system is based on the Vision Transformer (ViT) model, leveraging its superior global context awareness and multimodal data fusion capabilities. Additionally, it incorporates the proposed adaptive delay position sensing module and spatiotemporal dependency dynamic modeling module, effectively resolving issues related to data transmission latency and complex spatiotemporal dependencies between agents. This significantly enhances the accuracy and timeliness of heterogeneous multi-agent collaborative sensing systems.",

keywords = "Cooperative perception, Deep learning, Heterogeneous agents, Vision transformer",

author = "Yuan Gao and Sicong Liu and Xiangrui Xu and Zhiyang Ding and Bin Guo and Zhiwen Yu",

note = "Publisher Copyright: {\textcopyright} 2024 ACM.; 1st ACM International Workshop on Resource-efficient Mobile and Embedded LLM System in AIoT, RMELS 2024 ; Conference date: 04-11-2024",

year = "2024",

month = nov,

day = "4",

doi = "10.1145/3698383.3699621",

language = "英语",

series = "RMELS 2024 - Proceedings of the 1st ACM International Workshop on Resource-efficient Mobile and Embedded LLM System in AIoT, Part of: ACM Sensys 2024",

publisher = "Association for Computing Machinery, Inc",

pages = "3--5",

booktitle = "RMELS 2024 - Proceedings of the 1st ACM International Workshop on Resource-efficient Mobile and Embedded LLM System in AIoT, Part of",

}

Gao, Y, Liu, S, Xu, X, Ding, Z, Guo, B & Yu, Z 2024, Spatio-Temporal Synergy with ViT: Enhancing Collaborative Perception and Object Detection for Heterogeneous Agents. 在 RMELS 2024 - Proceedings of the 1st ACM International Workshop on Resource-efficient Mobile and Embedded LLM System in AIoT, Part of: ACM Sensys 2024. RMELS 2024 - Proceedings of the 1st ACM International Workshop on Resource-efficient Mobile and Embedded LLM System in AIoT, Part of: ACM Sensys 2024, Association for Computing Machinery, Inc, 页码 3-5, 1st ACM International Workshop on Resource-efficient Mobile and Embedded LLM System in AIoT, RMELS 2024, Hangzhou, 中国, 4/11/24. https://doi.org/10.1145/3698383.3699621

Spatio-Temporal Synergy with ViT: Enhancing Collaborative Perception and Object Detection for Heterogeneous Agents. / Gao, Yuan; Liu, Sicong; Xu, Xiangrui 等.
RMELS 2024 - Proceedings of the 1st ACM International Workshop on Resource-efficient Mobile and Embedded LLM System in AIoT, Part of: ACM Sensys 2024. Association for Computing Machinery, Inc, 2024. 页码 3-5 (RMELS 2024 - Proceedings of the 1st ACM International Workshop on Resource-efficient Mobile and Embedded LLM System in AIoT, Part of: ACM Sensys 2024).

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

TY - GEN

T1 - Spatio-Temporal Synergy with ViT

T2 - 1st ACM International Workshop on Resource-efficient Mobile and Embedded LLM System in AIoT, RMELS 2024

AU - Gao, Yuan

AU - Liu, Sicong

AU - Xu, Xiangrui

AU - Ding, Zhiyang

AU - Guo, Bin

AU - Yu, Zhiwen

PY - 2024/11/4

Y1 - 2024/11/4

N2 - To address the limitations of traditional heterogeneous agent cooperative sensing methods in terms of feedback latency and spatiotemporal dependencies, this paper proposes a heterogeneous agents enhancing cooperative perception and object detection system. The system is based on the Vision Transformer (ViT) model, leveraging its superior global context awareness and multimodal data fusion capabilities. Additionally, it incorporates the proposed adaptive delay position sensing module and spatiotemporal dependency dynamic modeling module, effectively resolving issues related to data transmission latency and complex spatiotemporal dependencies between agents. This significantly enhances the accuracy and timeliness of heterogeneous multi-agent collaborative sensing systems.

AB - To address the limitations of traditional heterogeneous agent cooperative sensing methods in terms of feedback latency and spatiotemporal dependencies, this paper proposes a heterogeneous agents enhancing cooperative perception and object detection system. The system is based on the Vision Transformer (ViT) model, leveraging its superior global context awareness and multimodal data fusion capabilities. Additionally, it incorporates the proposed adaptive delay position sensing module and spatiotemporal dependency dynamic modeling module, effectively resolving issues related to data transmission latency and complex spatiotemporal dependencies between agents. This significantly enhances the accuracy and timeliness of heterogeneous multi-agent collaborative sensing systems.

KW - Cooperative perception

KW - Deep learning

KW - Heterogeneous agents

KW - Vision transformer

UR - http://www.scopus.com/inward/record.url?scp=85212509239&partnerID=8YFLogxK

U2 - 10.1145/3698383.3699621

DO - 10.1145/3698383.3699621

M3 - 会议稿件

AN - SCOPUS:85212509239

T3 - RMELS 2024 - Proceedings of the 1st ACM International Workshop on Resource-efficient Mobile and Embedded LLM System in AIoT, Part of: ACM Sensys 2024

SP - 3

EP - 5

BT - RMELS 2024 - Proceedings of the 1st ACM International Workshop on Resource-efficient Mobile and Embedded LLM System in AIoT, Part of

PB - Association for Computing Machinery, Inc

Y2 - 4 November 2024

ER -

Gao Y, Liu S, Xu X, Ding Z, Guo B, Yu Z. Spatio-Temporal Synergy with ViT: Enhancing Collaborative Perception and Object Detection for Heterogeneous Agents. 在 RMELS 2024 - Proceedings of the 1st ACM International Workshop on Resource-efficient Mobile and Embedded LLM System in AIoT, Part of: ACM Sensys 2024. Association for Computing Machinery, Inc. 2024. 页码 3-5. (RMELS 2024 - Proceedings of the 1st ACM International Workshop on Resource-efficient Mobile and Embedded LLM System in AIoT, Part of: ACM Sensys 2024). doi: 10.1145/3698383.3699621

Spatio-Temporal Synergy with ViT: Enhancing Collaborative Perception and Object Detection for Heterogeneous Agents

摘要

出版系列

会议

访问文件

其它文件与链接

指纹

引用此