Skip to main navigation Skip to search Skip to main content

Semantic-Guided Diffusion for Robust Multi-Object Tracking with Temporal Enhancement

  • Northwestern Polytechnical University Xian
  • Northeastern University China

Research output: Contribution to journalArticlepeer-review

Abstract

Diffusion-based motion prediction methods have demonstrated strong capabilities in modeling nonlinear motion for multi-object tracking (MOT). However, in complex scenarios involving target interactions or occlusions, these methods still suffer from frequent identity switches and inaccurate trajectory predictions. This is primarily due to insufficient joint modeling of appearance and motion cues, as well as limited adaptability to diverse motion patterns. To address these challenges, we propose a semantic-guided diffusion-based method, termed SGDMOT, which jointly models target motion dynamics and identity consistency. Specifically, we leverage historical trajectories to query image-level global features, incorporating appearance and contextual information. These are fused with motion information via an attention mechanism, guiding the diffusion process to generate semantically consistent trajectories. Furthermore, we introduce a learnable multi-scale temporal modulation module that dynamically adjusts the encoding of diffusion time steps based on historical motion states. This enhances the model’s ability to adapt to motion variations across different temporal granularities, improving temporal modeling during the generation phase. Extensive experiments on the DanceTrack, MOT17, and MOT20 benchmarks demonstrate the effectiveness of our approach. Notably, on the DanceTrack test set, SGDMOT achieves an absolute gain of 2.3% in Higher Order Tracking Accuracy (HOTA) compared to a baseline diffusion model relying solely on motion features. Our code and pretrained models will be publicly released.

Keywords

  • Diffusion-based tracking
  • multi-object tracking
  • semantic-guided motion prediction

Fingerprint

Dive into the research topics of 'Semantic-Guided Diffusion for Robust Multi-Object Tracking with Temporal Enhancement'. Together they form a unique fingerprint.

Cite this