Logit prototype learning with active multimodal representation for robust open-set recognition

Yimin Fu; Zhunga Liu; Zicheng Wang

doi:10.1007/s11432-023-3924-x

Logit prototype learning with active multimodal representation for robust open-set recognition

Yimin Fu, Zhunga Liu, Zicheng Wang

自动化学院

Northwestern Polytechnical University Xian

科研成果: 期刊稿件 › 文章 › 同行评审

2 引用（Scopus）

摘要

Robust open-set recognition (OSR) performance has become a prerequisite for pattern recognition systems in real-world applications. However, the existing OSR methods are primarily implemented on the basis of single-modal perception, and their performance is limited when single-modal data fail to provide sufficient descriptions of the objects. Although multimodal data can provide more comprehensive information than single-modal data, the learning of decision boundaries can be affected by the feature representation gap between different modalities. To effectively integrate multimodal data for robust OSR performance, we propose logit prototype learning (LPL) with active multimodal representation. In LPL, the input multimodal data are transformed into the logit space, enabling a direct exploration of intermodal correlations without the impact of scale inconsistency. Then, the fusion weights of each modality are determined using an entropybased uncertainty estimation method. This approach realizes adaptive adjustment of the fusion strategy to provide comprehensive descriptions in the presence of external disturbances. Moreover, the single-modal and multimodal representations are jointly optimized interactively to learn discriminative decision boundaries. Finally, a stepwise recognition rule is employed to reduce the misclassification risk and facilitate the distinction between known and unknown classes. Extensive experiments on three multimodal datasets have been done to demonstrate the effectiveness of the proposed method.

源语言	英语
文章编号	162204
期刊	Science China Information Sciences
卷	67
期	6
DOI	https://doi.org/10.1007/s11432-023-3924-x
出版状态	已出版 - 6月 2024

访问文件

10.1007/s11432-023-3924-x

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{273167c0c156454cb1e8dacafe8cc67e,

title = "Logit prototype learning with active multimodal representation for robust open-set recognition",

abstract = "Robust open-set recognition (OSR) performance has become a prerequisite for pattern recognition systems in real-world applications. However, the existing OSR methods are primarily implemented on the basis of single-modal perception, and their performance is limited when single-modal data fail to provide sufficient descriptions of the objects. Although multimodal data can provide more comprehensive information than single-modal data, the learning of decision boundaries can be affected by the feature representation gap between different modalities. To effectively integrate multimodal data for robust OSR performance, we propose logit prototype learning (LPL) with active multimodal representation. In LPL, the input multimodal data are transformed into the logit space, enabling a direct exploration of intermodal correlations without the impact of scale inconsistency. Then, the fusion weights of each modality are determined using an entropybased uncertainty estimation method. This approach realizes adaptive adjustment of the fusion strategy to provide comprehensive descriptions in the presence of external disturbances. Moreover, the single-modal and multimodal representations are jointly optimized interactively to learn discriminative decision boundaries. Finally, a stepwise recognition rule is employed to reduce the misclassification risk and facilitate the distinction between known and unknown classes. Extensive experiments on three multimodal datasets have been done to demonstrate the effectiveness of the proposed method.",

keywords = "logit prototype learning, multimodal perception, open-set recognition, uncertainty estimation",

author = "Yimin Fu and Zhunga Liu and Zicheng Wang",

note = "Publisher Copyright: {\textcopyright} Science China Press 2024.",

year = "2024",

month = jun,

doi = "10.1007/s11432-023-3924-x",

language = "英语",

volume = "67",

journal = "Science China Information Sciences",

issn = "1674-733X",

publisher = "Science China Press ",

number = "6",

}

TY - JOUR

T1 - Logit prototype learning with active multimodal representation for robust open-set recognition

AU - Fu, Yimin

AU - Liu, Zhunga

AU - Wang, Zicheng

N1 - Publisher Copyright: © Science China Press 2024.

PY - 2024/6

Y1 - 2024/6

N2 - Robust open-set recognition (OSR) performance has become a prerequisite for pattern recognition systems in real-world applications. However, the existing OSR methods are primarily implemented on the basis of single-modal perception, and their performance is limited when single-modal data fail to provide sufficient descriptions of the objects. Although multimodal data can provide more comprehensive information than single-modal data, the learning of decision boundaries can be affected by the feature representation gap between different modalities. To effectively integrate multimodal data for robust OSR performance, we propose logit prototype learning (LPL) with active multimodal representation. In LPL, the input multimodal data are transformed into the logit space, enabling a direct exploration of intermodal correlations without the impact of scale inconsistency. Then, the fusion weights of each modality are determined using an entropybased uncertainty estimation method. This approach realizes adaptive adjustment of the fusion strategy to provide comprehensive descriptions in the presence of external disturbances. Moreover, the single-modal and multimodal representations are jointly optimized interactively to learn discriminative decision boundaries. Finally, a stepwise recognition rule is employed to reduce the misclassification risk and facilitate the distinction between known and unknown classes. Extensive experiments on three multimodal datasets have been done to demonstrate the effectiveness of the proposed method.

AB - Robust open-set recognition (OSR) performance has become a prerequisite for pattern recognition systems in real-world applications. However, the existing OSR methods are primarily implemented on the basis of single-modal perception, and their performance is limited when single-modal data fail to provide sufficient descriptions of the objects. Although multimodal data can provide more comprehensive information than single-modal data, the learning of decision boundaries can be affected by the feature representation gap between different modalities. To effectively integrate multimodal data for robust OSR performance, we propose logit prototype learning (LPL) with active multimodal representation. In LPL, the input multimodal data are transformed into the logit space, enabling a direct exploration of intermodal correlations without the impact of scale inconsistency. Then, the fusion weights of each modality are determined using an entropybased uncertainty estimation method. This approach realizes adaptive adjustment of the fusion strategy to provide comprehensive descriptions in the presence of external disturbances. Moreover, the single-modal and multimodal representations are jointly optimized interactively to learn discriminative decision boundaries. Finally, a stepwise recognition rule is employed to reduce the misclassification risk and facilitate the distinction between known and unknown classes. Extensive experiments on three multimodal datasets have been done to demonstrate the effectiveness of the proposed method.

KW - logit prototype learning

KW - multimodal perception

KW - open-set recognition

KW - uncertainty estimation

UR - http://www.scopus.com/inward/record.url?scp=85195131610&partnerID=8YFLogxK

U2 - 10.1007/s11432-023-3924-x

DO - 10.1007/s11432-023-3924-x

M3 - 文章

AN - SCOPUS:85195131610

SN - 1674-733X

VL - 67

JO - Science China Information Sciences

JF - Science China Information Sciences

IS - 6

M1 - 162204

ER -

Logit prototype learning with active multimodal representation for robust open-set recognition

摘要

访问文件

其它文件与链接

指纹

引用此