Single- and cross-modality near duplicate image pairs detection via spatial transformer comparing CNN

Yi Zhang; Shizhou Zhang; Ying Li; Yanning Zhang

doi:10.3390/s21010255

Single- and cross-modality near duplicate image pairs detection via spatial transformer comparing CNN

Yi Zhang, Shizhou Zhang, Ying Li, Yanning Zhang

计算机学院

科研成果: 期刊稿件 › 文章 › 同行评审

17 引用（Scopus）

摘要

Recently, both single modality and cross modality near-duplicate image detection tasks have received wide attention in the community of pattern recognition and computer vision. Existing deep neural networks-based methods have achieved remarkable performance in this task. However, most of the methods mainly focus on the learning of each image from the image pair, thus leading to less use of the information between the near duplicate image pairs to some extent. In this paper, to make more use of the correlations between image pairs, we propose a spatial transformer comparing convolutional neural network (CNN) model to compare near-duplicate image pairs. Specifically, we firstly propose a comparing CNN framework, which is equipped with a cross-stream to fully learn the correlation information between image pairs, while considering the features of each image. Furthermore, to deal with the local deformations led by cropping, translation, scaling, and nonrigid transformations, we additionally introduce a spatial transformer comparing CNN model by incorporating a spatial transformer module to the comparing CNN architecture. To demonstrate the effectiveness of the proposed method on both the single-modality and cross-modality (Optical- InfraRed) near-duplicate image pair detection tasks, we conduct extensive experiments on three popular benchmark datasets, namely CaliforniaND (ND means near duplicate), Mir-Flickr Near Duplicate, and TNO Multi-band Image Data Collection. The experimental results show that the proposed method can achieve superior performance compared with many state-of-the-art methods on both tasks.

源语言	英语
文章编号	255
页（从-至）	1-22
页数	22
期刊	Sensors
卷	21
期	1
DOI	https://doi.org/10.3390/s21010255
出版状态	已出版 - 1 1月 2021

访问文件

10.3390/s21010255

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{60b825f68ed74ab9812f3e011508a02a,

title = "Single- and cross-modality near duplicate image pairs detection via spatial transformer comparing CNN",

abstract = "Recently, both single modality and cross modality near-duplicate image detection tasks have received wide attention in the community of pattern recognition and computer vision. Existing deep neural networks-based methods have achieved remarkable performance in this task. However, most of the methods mainly focus on the learning of each image from the image pair, thus leading to less use of the information between the near duplicate image pairs to some extent. In this paper, to make more use of the correlations between image pairs, we propose a spatial transformer comparing convolutional neural network (CNN) model to compare near-duplicate image pairs. Specifically, we firstly propose a comparing CNN framework, which is equipped with a cross-stream to fully learn the correlation information between image pairs, while considering the features of each image. Furthermore, to deal with the local deformations led by cropping, translation, scaling, and nonrigid transformations, we additionally introduce a spatial transformer comparing CNN model by incorporating a spatial transformer module to the comparing CNN architecture. To demonstrate the effectiveness of the proposed method on both the single-modality and cross-modality (Optical- InfraRed) near-duplicate image pair detection tasks, we conduct extensive experiments on three popular benchmark datasets, namely CaliforniaND (ND means near duplicate), Mir-Flickr Near Duplicate, and TNO Multi-band Image Data Collection. The experimental results show that the proposed method can achieve superior performance compared with many state-of-the-art methods on both tasks.",

keywords = "Comparing CNN, Near duplicate image pairs, Spatial transformer network",

author = "Yi Zhang and Shizhou Zhang and Ying Li and Yanning Zhang",

note = "Publisher Copyright: {\textcopyright} 2021 by the authors. Licensee MDPI, Basel, Switzerland.",

year = "2021",

month = jan,

day = "1",

doi = "10.3390/s21010255",

language = "英语",

volume = "21",

pages = "1--22",

journal = "Sensors",

issn = "1424-8220",

publisher = "Multidisciplinary Digital Publishing Institute (MDPI)",

number = "1",

}

TY - JOUR

T1 - Single- and cross-modality near duplicate image pairs detection via spatial transformer comparing CNN

AU - Zhang, Yi

AU - Zhang, Shizhou

AU - Li, Ying

AU - Zhang, Yanning

PY - 2021/1/1

Y1 - 2021/1/1

N2 - Recently, both single modality and cross modality near-duplicate image detection tasks have received wide attention in the community of pattern recognition and computer vision. Existing deep neural networks-based methods have achieved remarkable performance in this task. However, most of the methods mainly focus on the learning of each image from the image pair, thus leading to less use of the information between the near duplicate image pairs to some extent. In this paper, to make more use of the correlations between image pairs, we propose a spatial transformer comparing convolutional neural network (CNN) model to compare near-duplicate image pairs. Specifically, we firstly propose a comparing CNN framework, which is equipped with a cross-stream to fully learn the correlation information between image pairs, while considering the features of each image. Furthermore, to deal with the local deformations led by cropping, translation, scaling, and nonrigid transformations, we additionally introduce a spatial transformer comparing CNN model by incorporating a spatial transformer module to the comparing CNN architecture. To demonstrate the effectiveness of the proposed method on both the single-modality and cross-modality (Optical- InfraRed) near-duplicate image pair detection tasks, we conduct extensive experiments on three popular benchmark datasets, namely CaliforniaND (ND means near duplicate), Mir-Flickr Near Duplicate, and TNO Multi-band Image Data Collection. The experimental results show that the proposed method can achieve superior performance compared with many state-of-the-art methods on both tasks.

AB - Recently, both single modality and cross modality near-duplicate image detection tasks have received wide attention in the community of pattern recognition and computer vision. Existing deep neural networks-based methods have achieved remarkable performance in this task. However, most of the methods mainly focus on the learning of each image from the image pair, thus leading to less use of the information between the near duplicate image pairs to some extent. In this paper, to make more use of the correlations between image pairs, we propose a spatial transformer comparing convolutional neural network (CNN) model to compare near-duplicate image pairs. Specifically, we firstly propose a comparing CNN framework, which is equipped with a cross-stream to fully learn the correlation information between image pairs, while considering the features of each image. Furthermore, to deal with the local deformations led by cropping, translation, scaling, and nonrigid transformations, we additionally introduce a spatial transformer comparing CNN model by incorporating a spatial transformer module to the comparing CNN architecture. To demonstrate the effectiveness of the proposed method on both the single-modality and cross-modality (Optical- InfraRed) near-duplicate image pair detection tasks, we conduct extensive experiments on three popular benchmark datasets, namely CaliforniaND (ND means near duplicate), Mir-Flickr Near Duplicate, and TNO Multi-band Image Data Collection. The experimental results show that the proposed method can achieve superior performance compared with many state-of-the-art methods on both tasks.

KW - Comparing CNN

KW - Near duplicate image pairs

KW - Spatial transformer network

UR - http://www.scopus.com/inward/record.url?scp=85099004163&partnerID=8YFLogxK

U2 - 10.3390/s21010255

DO - 10.3390/s21010255

M3 - 文章

C2 - 33401740

AN - SCOPUS:85099004163

SN - 1424-8220

VL - 21

SP - 1

EP - 22

JO - Sensors

JF - Sensors

IS - 1

M1 - 255

ER -

Single- and cross-modality near duplicate image pairs detection via spatial transformer comparing CNN

摘要

访问文件

其它文件与链接

指纹

引用此