CIsup3Former: A Cross-Image Information Interaction Network for Kinship Verification

Lei Li; Quan Zhou; Dong Huang; Zhaoqiang Xia

doi:10.1109/TCSVT.2025.3562592

CIsup3Former: A Cross-Image Information Interaction Network for Kinship Verification

Lei Li, Quan Zhou, Dong Huang, Zhaoqiang Xia

School of Electronics and Information

Research output: Contribution to journal › Article › peer-review

Abstract

Kinship verification using facial information determines whether two faces share a familial relationship. Existing methods improve verification by leveraging negative sample information and addressing distribution differences but often extract independent features from parent and child images separately, ignoring variations in pairwise similarity. To overcome this, we propose CI³Former, a Swin-Transformer-based model that enables cross-image information interaction for joint feature extraction. By incorporating a Self-Attention based Interaction (SAI) module within each Swin-Transformer block, our method allows mutual querying between parent and child features, dynamically guiding region-level feature extraction and adaptively focusing on similar regions. Additionally, we introduce a Multi-metric Similarity based Interaction (MSI) module for feature fusion, which processes paired features through similarity measurements before final prediction. The model is trained with contrastive and binary cross-entropy losses to enhance coupled feature learning. Extensive experiments on four kinship verification datasets and a signature verification dataset demonstrate that CI³Former outperforms state-of-the-art methods, showcasing its effectiveness, robustness, and strong cross-task generalization.

Original language	English
Journal	IEEE Transactions on Circuits and Systems for Video Technology
DOIs	https://doi.org/10.1109/TCSVT.2025.3562592
State	Accepted/In press - 2025

Keywords

Information Interaction
Kinship Verification
Transformer Network

Access to Document

10.1109/TCSVT.2025.3562592

Cite this

@article{b72301715bc841e8b74f4e2e8b81f680,

title = "CIsup3Former: A Cross-Image Information Interaction Network for Kinship Verification",

abstract = "Kinship verification using facial information determines whether two faces share a familial relationship. Existing methods improve verification by leveraging negative sample information and addressing distribution differences but often extract independent features from parent and child images separately, ignoring variations in pairwise similarity. To overcome this, we propose CI3Former, a Swin-Transformer-based model that enables cross-image information interaction for joint feature extraction. By incorporating a Self-Attention based Interaction (SAI) module within each Swin-Transformer block, our method allows mutual querying between parent and child features, dynamically guiding region-level feature extraction and adaptively focusing on similar regions. Additionally, we introduce a Multi-metric Similarity based Interaction (MSI) module for feature fusion, which processes paired features through similarity measurements before final prediction. The model is trained with contrastive and binary cross-entropy losses to enhance coupled feature learning. Extensive experiments on four kinship verification datasets and a signature verification dataset demonstrate that CI3Former outperforms state-of-the-art methods, showcasing its effectiveness, robustness, and strong cross-task generalization.",

keywords = "Information Interaction, Kinship Verification, Transformer Network",

author = "Lei Li and Quan Zhou and Dong Huang and Zhaoqiang Xia",

note = "Publisher Copyright: {\textcopyright} 1991-2012 IEEE.",

year = "2025",

doi = "10.1109/TCSVT.2025.3562592",

language = "英语",

journal = "IEEE Transactions on Circuits and Systems for Video Technology",

issn = "1051-8215",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - JOUR

T1 - CIsup3Former

T2 - A Cross-Image Information Interaction Network for Kinship Verification

AU - Li, Lei

AU - Zhou, Quan

AU - Huang, Dong

AU - Xia, Zhaoqiang

PY - 2025

Y1 - 2025

N2 - Kinship verification using facial information determines whether two faces share a familial relationship. Existing methods improve verification by leveraging negative sample information and addressing distribution differences but often extract independent features from parent and child images separately, ignoring variations in pairwise similarity. To overcome this, we propose CI3Former, a Swin-Transformer-based model that enables cross-image information interaction for joint feature extraction. By incorporating a Self-Attention based Interaction (SAI) module within each Swin-Transformer block, our method allows mutual querying between parent and child features, dynamically guiding region-level feature extraction and adaptively focusing on similar regions. Additionally, we introduce a Multi-metric Similarity based Interaction (MSI) module for feature fusion, which processes paired features through similarity measurements before final prediction. The model is trained with contrastive and binary cross-entropy losses to enhance coupled feature learning. Extensive experiments on four kinship verification datasets and a signature verification dataset demonstrate that CI3Former outperforms state-of-the-art methods, showcasing its effectiveness, robustness, and strong cross-task generalization.

AB - Kinship verification using facial information determines whether two faces share a familial relationship. Existing methods improve verification by leveraging negative sample information and addressing distribution differences but often extract independent features from parent and child images separately, ignoring variations in pairwise similarity. To overcome this, we propose CI3Former, a Swin-Transformer-based model that enables cross-image information interaction for joint feature extraction. By incorporating a Self-Attention based Interaction (SAI) module within each Swin-Transformer block, our method allows mutual querying between parent and child features, dynamically guiding region-level feature extraction and adaptively focusing on similar regions. Additionally, we introduce a Multi-metric Similarity based Interaction (MSI) module for feature fusion, which processes paired features through similarity measurements before final prediction. The model is trained with contrastive and binary cross-entropy losses to enhance coupled feature learning. Extensive experiments on four kinship verification datasets and a signature verification dataset demonstrate that CI3Former outperforms state-of-the-art methods, showcasing its effectiveness, robustness, and strong cross-task generalization.

KW - Information Interaction

KW - Kinship Verification

KW - Transformer Network

UR - http://www.scopus.com/inward/record.url?scp=105003383372&partnerID=8YFLogxK

U2 - 10.1109/TCSVT.2025.3562592

DO - 10.1109/TCSVT.2025.3562592

M3 - 文章

AN - SCOPUS:105003383372

SN - 1051-8215

JO - IEEE Transactions on Circuits and Systems for Video Technology

JF - IEEE Transactions on Circuits and Systems for Video Technology

ER -

CIsup3Former: A Cross-Image Information Interaction Network for Kinship Verification

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this