跳到主要导航 跳到搜索 跳到主要内容

Cas-OVD: Cascaded Open-Vocabulary Detection of Small Objects Using Multi-Refined Region Proposal Network in Autonomous Driving

  • Zhenyu Fang
  • , Yulong Wu
  • , Jinchang Ren
  • , Jiangbin Zheng
  • , Yijun Yan
  • , Lixiang Zhang

科研成果: 期刊稿件文章同行评审

摘要

Although text information has aided existing models to achieve promising results in open vocabulary object detection (OVD), the lack of semantic information has led to the difficulty in small objects detection (SOD). Moreover, such semantic gap also causes failure when matching texts and image features, resulting in false negative instances being detected. To address these issues, we propose a Cascade Open Vocabulary Detector (Cas-OVD), which builds upon existing multi-stage detection pipelines but specializes in text-vision alignment for small objects. In particular, we adapt a multi-refined region proposal network, guided by a non-sampled anchor strategy, to reduce the missing and false detections of small objects. Meanwhile, a deformable convolution network based feature conversion module is proposed to enhance the semantic information of small objects even the potential ones with low confidence. Unlike existing methods that rely on coarse-grained image-based features for image-text matching, Cas-OVD refines these features through a cascade alignment process, allowing each stage to build on the results of the previous one. This can progressively enhance the feature correlation between the image regions and the textual descriptions through successive error correction. On the joint BDD100K-SODA-D dataset, CasOVD achieved 17.95% APall and 14.6% APs, outperforming RegionCLIP by 3.5% APall and 3.0% APs, respectively. On the OV_COCO dataset, Cas-OVD has the 32.71% APall and 17.26% APs, surpassing the RegionCLIP by 6.6% APall and 6.1% APs, respectively.

源语言英语
页(从-至)757-771
页数15
期刊IEEE Transactions on Multimedia
28
DOI
出版状态已出版 - 2026

指纹

探究 'Cas-OVD: Cascaded Open-Vocabulary Detection of Small Objects Using Multi-Refined Region Proposal Network in Autonomous Driving' 的科研主题。它们共同构成独一无二的指纹。

引用此