Visual–Semantic Fuzzy Interaction Network for Zero-Shot Learning

Xuemeng Hui; Zhunga Liu; Jiaxiang Liu; Zuowei Zhang; Longfei Wang

doi:10.1109/TAI.2024.3524955

Visual–Semantic Fuzzy Interaction Network for Zero-Shot Learning

Xuemeng Hui, Zhunga Liu, Jiaxiang Liu, Zuowei Zhang, Longfei Wang

School of Automation

Northwestern Polytechnical University Xian

Research output: Contribution to journal › Article › peer-review

Abstract

Zero-shot learning (ZSL) aims to recognize unseen class image objects using manually defined semantic knowledge corresponding to both seen and unseen images. The key of ZSL lies in building the interaction between precise image data and fuzzy semantic knowledge. The fuzziness is attributed to the difficulty in quantifying human knowledge. However, the existing ZSL methods ignore the inherent fuzziness of semantic knowledge and treat it as precise data during building the visual–semantic interaction. This is not good for transferring semantic knowledge from seen classes to unseen classes. In order to solve this problem, we propose a visual–semantic fuzzy interaction network (VSFIN) for ZSL. VSFIN utilize an effective encoder–decoder structure, including a semantic prototype encoder (SPE) and visual feature decoder (VFD). The SPE and VFD enable the visual features to interact with semantic knowledge via cross-attention. To achieve visual–semantic fuzzy interaction in SPE and VFD, we introduce the concept of membership function in fuzzy set theory and design a membership loss function. This loss function allows for a certain degree of imprecision in visual–semantic interaction, thereby enabling VSFIN to becomingly utilize the given semantic knowledge. Moreover, we introduce the concept of rank sum test and propose a distribution alignment loss to alleviate the bias towards seen classes. Extensive experiments on three widely used benchmarks have demonstrated that VSFIN outperforms current state-of-the-art methods under both conventional ZSL (CZSL) and generalized ZSL (GZSL) settings.

Original language	English
Pages (from-to)	1345-1359
Number of pages	15
Journal	IEEE Transactions on Artificial Intelligence
Volume	6
Issue number	5
DOIs	https://doi.org/10.1109/TAI.2024.3524955
State	Published - 2025

Keywords

Fuzzy set theory
knowledge transfer
membership function
object recognition
zero-shot learning

Access to Document

10.1109/TAI.2024.3524955

Cite this

@article{084d7ed392a1496d9a2d28b47a020d3f,

title = "Visual–Semantic Fuzzy Interaction Network for Zero-Shot Learning",

abstract = "Zero-shot learning (ZSL) aims to recognize unseen class image objects using manually defined semantic knowledge corresponding to both seen and unseen images. The key of ZSL lies in building the interaction between precise image data and fuzzy semantic knowledge. The fuzziness is attributed to the difficulty in quantifying human knowledge. However, the existing ZSL methods ignore the inherent fuzziness of semantic knowledge and treat it as precise data during building the visual–semantic interaction. This is not good for transferring semantic knowledge from seen classes to unseen classes. In order to solve this problem, we propose a visual–semantic fuzzy interaction network (VSFIN) for ZSL. VSFIN utilize an effective encoder–decoder structure, including a semantic prototype encoder (SPE) and visual feature decoder (VFD). The SPE and VFD enable the visual features to interact with semantic knowledge via cross-attention. To achieve visual–semantic fuzzy interaction in SPE and VFD, we introduce the concept of membership function in fuzzy set theory and design a membership loss function. This loss function allows for a certain degree of imprecision in visual–semantic interaction, thereby enabling VSFIN to becomingly utilize the given semantic knowledge. Moreover, we introduce the concept of rank sum test and propose a distribution alignment loss to alleviate the bias towards seen classes. Extensive experiments on three widely used benchmarks have demonstrated that VSFIN outperforms current state-of-the-art methods under both conventional ZSL (CZSL) and generalized ZSL (GZSL) settings.",

keywords = "Fuzzy set theory, knowledge transfer, membership function, object recognition, zero-shot learning",

author = "Xuemeng Hui and Zhunga Liu and Jiaxiang Liu and Zuowei Zhang and Longfei Wang",

note = "Publisher Copyright: {\textcopyright} 2020 IEEE.",

year = "2025",

doi = "10.1109/TAI.2024.3524955",

language = "英语",

volume = "6",

pages = "1345--1359",

journal = "IEEE Transactions on Artificial Intelligence",

issn = "2691-4581",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

number = "5",

}

TY - JOUR

T1 - Visual–Semantic Fuzzy Interaction Network for Zero-Shot Learning

AU - Hui, Xuemeng

AU - Liu, Zhunga

AU - Liu, Jiaxiang

AU - Zhang, Zuowei

AU - Wang, Longfei

PY - 2025

Y1 - 2025

N2 - Zero-shot learning (ZSL) aims to recognize unseen class image objects using manually defined semantic knowledge corresponding to both seen and unseen images. The key of ZSL lies in building the interaction between precise image data and fuzzy semantic knowledge. The fuzziness is attributed to the difficulty in quantifying human knowledge. However, the existing ZSL methods ignore the inherent fuzziness of semantic knowledge and treat it as precise data during building the visual–semantic interaction. This is not good for transferring semantic knowledge from seen classes to unseen classes. In order to solve this problem, we propose a visual–semantic fuzzy interaction network (VSFIN) for ZSL. VSFIN utilize an effective encoder–decoder structure, including a semantic prototype encoder (SPE) and visual feature decoder (VFD). The SPE and VFD enable the visual features to interact with semantic knowledge via cross-attention. To achieve visual–semantic fuzzy interaction in SPE and VFD, we introduce the concept of membership function in fuzzy set theory and design a membership loss function. This loss function allows for a certain degree of imprecision in visual–semantic interaction, thereby enabling VSFIN to becomingly utilize the given semantic knowledge. Moreover, we introduce the concept of rank sum test and propose a distribution alignment loss to alleviate the bias towards seen classes. Extensive experiments on three widely used benchmarks have demonstrated that VSFIN outperforms current state-of-the-art methods under both conventional ZSL (CZSL) and generalized ZSL (GZSL) settings.

AB - Zero-shot learning (ZSL) aims to recognize unseen class image objects using manually defined semantic knowledge corresponding to both seen and unseen images. The key of ZSL lies in building the interaction between precise image data and fuzzy semantic knowledge. The fuzziness is attributed to the difficulty in quantifying human knowledge. However, the existing ZSL methods ignore the inherent fuzziness of semantic knowledge and treat it as precise data during building the visual–semantic interaction. This is not good for transferring semantic knowledge from seen classes to unseen classes. In order to solve this problem, we propose a visual–semantic fuzzy interaction network (VSFIN) for ZSL. VSFIN utilize an effective encoder–decoder structure, including a semantic prototype encoder (SPE) and visual feature decoder (VFD). The SPE and VFD enable the visual features to interact with semantic knowledge via cross-attention. To achieve visual–semantic fuzzy interaction in SPE and VFD, we introduce the concept of membership function in fuzzy set theory and design a membership loss function. This loss function allows for a certain degree of imprecision in visual–semantic interaction, thereby enabling VSFIN to becomingly utilize the given semantic knowledge. Moreover, we introduce the concept of rank sum test and propose a distribution alignment loss to alleviate the bias towards seen classes. Extensive experiments on three widely used benchmarks have demonstrated that VSFIN outperforms current state-of-the-art methods under both conventional ZSL (CZSL) and generalized ZSL (GZSL) settings.

KW - Fuzzy set theory

KW - knowledge transfer

KW - membership function

KW - object recognition

KW - zero-shot learning

UR - http://www.scopus.com/inward/record.url?scp=85215627250&partnerID=8YFLogxK

U2 - 10.1109/TAI.2024.3524955

DO - 10.1109/TAI.2024.3524955

M3 - 文章

AN - SCOPUS:85215627250

SN - 2691-4581

VL - 6

SP - 1345

EP - 1359

JO - IEEE Transactions on Artificial Intelligence

JF - IEEE Transactions on Artificial Intelligence

IS - 5

ER -

Visual–Semantic Fuzzy Interaction Network for Zero-Shot Learning

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this