P-CNN: Part-Based Convolutional Neural Networks for Fine-Grained Visual Categorization

Junwei Han; Xiwen Yao; Gong Cheng; Xiaoxu Feng; Dong Xu

doi:10.1109/TPAMI.2019.2933510

P-CNN: Part-Based Convolutional Neural Networks for Fine-Grained Visual Categorization

Junwei Han, Xiwen Yao, Gong Cheng, Xiaoxu Feng, Dong Xu

自动化学院

科研成果: 期刊稿件 › 文章 › 同行评审

92 引用（Scopus）

摘要

This paper proposes an end-to-end fine-grained visual categorization system, termed Part-based Convolutional Neural Network (P-CNN), which consists of three modules. The first module is a Squeeze-and-Excitation (SE) block, which learns to recalibrate channel-wise feature responses by emphasizing informative channels and suppressing less useful ones. The second module is a Part Localization Network (PLN) used to locate distinctive object parts, through which a bank of convolutional filters are learned as discriminative part detectors. Thus, a group of informative parts can be discovered by convolving the feature maps with each part detector. The third module is a Part Classification Network (PCN) that has two streams. The first stream classifies each individual object part into image-level categories. The second stream concatenates part features and global feature into a joint feature for the final classification. In order to learn powerful part features and boost the joint feature capability, we propose a Duplex Focal Loss used for metric learning and part classification, which focuses on training hard examples. We further merge PLN and PCN into a unified network for an end-to-end training process via a simple training technique. Comprehensive experiments and comparisons with state-of-the-art methods on three benchmark datasets demonstrate the effectiveness of our proposed method.

源语言	英语
页（从-至）	579-590
页数	12
期刊	IEEE Transactions on Pattern Analysis and Machine Intelligence
卷	44
期	2
DOI	https://doi.org/10.1109/TPAMI.2019.2933510
出版状态	已出版 - 1 2月 2022

访问文件

10.1109/TPAMI.2019.2933510

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{82a9791ba24f46fb974e2f804f58d83d,

title = "P-CNN: Part-Based Convolutional Neural Networks for Fine-Grained Visual Categorization",

abstract = "This paper proposes an end-to-end fine-grained visual categorization system, termed Part-based Convolutional Neural Network (P-CNN), which consists of three modules. The first module is a Squeeze-and-Excitation (SE) block, which learns to recalibrate channel-wise feature responses by emphasizing informative channels and suppressing less useful ones. The second module is a Part Localization Network (PLN) used to locate distinctive object parts, through which a bank of convolutional filters are learned as discriminative part detectors. Thus, a group of informative parts can be discovered by convolving the feature maps with each part detector. The third module is a Part Classification Network (PCN) that has two streams. The first stream classifies each individual object part into image-level categories. The second stream concatenates part features and global feature into a joint feature for the final classification. In order to learn powerful part features and boost the joint feature capability, we propose a Duplex Focal Loss used for metric learning and part classification, which focuses on training hard examples. We further merge PLN and PCN into a unified network for an end-to-end training process via a simple training technique. Comprehensive experiments and comparisons with state-of-the-art methods on three benchmark datasets demonstrate the effectiveness of our proposed method.",

keywords = "Part localization network, duplex focal loss, fine-grained visual categorization, part classification network",

author = "Junwei Han and Xiwen Yao and Gong Cheng and Xiaoxu Feng and Dong Xu",

note = "Publisher Copyright: {\textcopyright} 1979-2012 IEEE.",

year = "2022",

month = feb,

day = "1",

doi = "10.1109/TPAMI.2019.2933510",

language = "英语",

volume = "44",

pages = "579--590",

journal = "IEEE Transactions on Pattern Analysis and Machine Intelligence",

issn = "0162-8828",

publisher = "IEEE Computer Society",

number = "2",

}

TY - JOUR

T1 - P-CNN

T2 - Part-Based Convolutional Neural Networks for Fine-Grained Visual Categorization

AU - Han, Junwei

AU - Yao, Xiwen

AU - Cheng, Gong

AU - Feng, Xiaoxu

AU - Xu, Dong

PY - 2022/2/1

Y1 - 2022/2/1

N2 - This paper proposes an end-to-end fine-grained visual categorization system, termed Part-based Convolutional Neural Network (P-CNN), which consists of three modules. The first module is a Squeeze-and-Excitation (SE) block, which learns to recalibrate channel-wise feature responses by emphasizing informative channels and suppressing less useful ones. The second module is a Part Localization Network (PLN) used to locate distinctive object parts, through which a bank of convolutional filters are learned as discriminative part detectors. Thus, a group of informative parts can be discovered by convolving the feature maps with each part detector. The third module is a Part Classification Network (PCN) that has two streams. The first stream classifies each individual object part into image-level categories. The second stream concatenates part features and global feature into a joint feature for the final classification. In order to learn powerful part features and boost the joint feature capability, we propose a Duplex Focal Loss used for metric learning and part classification, which focuses on training hard examples. We further merge PLN and PCN into a unified network for an end-to-end training process via a simple training technique. Comprehensive experiments and comparisons with state-of-the-art methods on three benchmark datasets demonstrate the effectiveness of our proposed method.

AB - This paper proposes an end-to-end fine-grained visual categorization system, termed Part-based Convolutional Neural Network (P-CNN), which consists of three modules. The first module is a Squeeze-and-Excitation (SE) block, which learns to recalibrate channel-wise feature responses by emphasizing informative channels and suppressing less useful ones. The second module is a Part Localization Network (PLN) used to locate distinctive object parts, through which a bank of convolutional filters are learned as discriminative part detectors. Thus, a group of informative parts can be discovered by convolving the feature maps with each part detector. The third module is a Part Classification Network (PCN) that has two streams. The first stream classifies each individual object part into image-level categories. The second stream concatenates part features and global feature into a joint feature for the final classification. In order to learn powerful part features and boost the joint feature capability, we propose a Duplex Focal Loss used for metric learning and part classification, which focuses on training hard examples. We further merge PLN and PCN into a unified network for an end-to-end training process via a simple training technique. Comprehensive experiments and comparisons with state-of-the-art methods on three benchmark datasets demonstrate the effectiveness of our proposed method.

KW - Part localization network

KW - duplex focal loss

KW - fine-grained visual categorization

KW - part classification network

UR - http://www.scopus.com/inward/record.url?scp=85122800249&partnerID=8YFLogxK

U2 - 10.1109/TPAMI.2019.2933510

DO - 10.1109/TPAMI.2019.2933510

M3 - 文章

C2 - 31398107

AN - SCOPUS:85122800249

SN - 0162-8828

VL - 44

SP - 579

EP - 590

JO - IEEE Transactions on Pattern Analysis and Machine Intelligence

JF - IEEE Transactions on Pattern Analysis and Machine Intelligence

IS - 2

ER -

P-CNN: Part-Based Convolutional Neural Networks for Fine-Grained Visual Categorization

摘要

访问文件

其它文件与链接

指纹

引用此