跳到主要导航 跳到搜索 跳到主要内容

Semantic-Guided Diffusion for Robust Multi-Object Tracking with Temporal Enhancement

  • Northwestern Polytechnical University Xian
  • Northeastern University China

科研成果: 期刊稿件文章同行评审

摘要

Diffusion-based motion prediction methods have demonstrated strong capabilities in modeling nonlinear motion for multi-object tracking (MOT). However, in complex scenarios involving target interactions or occlusions, these methods still suffer from frequent identity switches and inaccurate trajectory predictions. This is primarily due to insufficient joint modeling of appearance and motion cues, as well as limited adaptability to diverse motion patterns. To address these challenges, we propose a semantic-guided diffusion-based method, termed SGDMOT, which jointly models target motion dynamics and identity consistency. Specifically, we leverage historical trajectories to query image-level global features, incorporating appearance and contextual information. These are fused with motion information via an attention mechanism, guiding the diffusion process to generate semantically consistent trajectories. Furthermore, we introduce a learnable multi-scale temporal modulation module that dynamically adjusts the encoding of diffusion time steps based on historical motion states. This enhances the model’s ability to adapt to motion variations across different temporal granularities, improving temporal modeling during the generation phase. Extensive experiments on the DanceTrack, MOT17, and MOT20 benchmarks demonstrate the effectiveness of our approach. Notably, on the DanceTrack test set, SGDMOT achieves an absolute gain of 2.3% in Higher Order Tracking Accuracy (HOTA) compared to a baseline diffusion model relying solely on motion features. Our code and pretrained models will be publicly released.

指纹

探究 'Semantic-Guided Diffusion for Robust Multi-Object Tracking with Temporal Enhancement' 的科研主题。它们共同构成独一无二的指纹。

引用此