P-CNN: Part-Based Convolutional Neural Networks for Fine-Grained Visual Categorization

Junwei Han; Xiwen Yao; Gong Cheng; Xiaoxu Feng; Dong Xu

doi:10.1109/TPAMI.2019.2933510

P-CNN: Part-Based Convolutional Neural Networks for Fine-Grained Visual Categorization

Junwei Han, Xiwen Yao, Gong Cheng, Xiaoxu Feng, Dong Xu

School of Automation

Research output: Contribution to journal › Article › peer-review

92 Scopus citations

Abstract

This paper proposes an end-to-end fine-grained visual categorization system, termed Part-based Convolutional Neural Network (P-CNN), which consists of three modules. The first module is a Squeeze-and-Excitation (SE) block, which learns to recalibrate channel-wise feature responses by emphasizing informative channels and suppressing less useful ones. The second module is a Part Localization Network (PLN) used to locate distinctive object parts, through which a bank of convolutional filters are learned as discriminative part detectors. Thus, a group of informative parts can be discovered by convolving the feature maps with each part detector. The third module is a Part Classification Network (PCN) that has two streams. The first stream classifies each individual object part into image-level categories. The second stream concatenates part features and global feature into a joint feature for the final classification. In order to learn powerful part features and boost the joint feature capability, we propose a Duplex Focal Loss used for metric learning and part classification, which focuses on training hard examples. We further merge PLN and PCN into a unified network for an end-to-end training process via a simple training technique. Comprehensive experiments and comparisons with state-of-the-art methods on three benchmark datasets demonstrate the effectiveness of our proposed method.

Original language	English
Pages (from-to)	579-590
Number of pages	12
Journal	IEEE Transactions on Pattern Analysis and Machine Intelligence
Volume	44
Issue number	2
DOIs	https://doi.org/10.1109/TPAMI.2019.2933510
State	Published - 1 Feb 2022

Keywords

Part localization network
duplex focal loss
fine-grained visual categorization
part classification network

Access to Document

10.1109/TPAMI.2019.2933510

Cite this

@article{82a9791ba24f46fb974e2f804f58d83d,

title = "P-CNN: Part-Based Convolutional Neural Networks for Fine-Grained Visual Categorization",

abstract = "This paper proposes an end-to-end fine-grained visual categorization system, termed Part-based Convolutional Neural Network (P-CNN), which consists of three modules. The first module is a Squeeze-and-Excitation (SE) block, which learns to recalibrate channel-wise feature responses by emphasizing informative channels and suppressing less useful ones. The second module is a Part Localization Network (PLN) used to locate distinctive object parts, through which a bank of convolutional filters are learned as discriminative part detectors. Thus, a group of informative parts can be discovered by convolving the feature maps with each part detector. The third module is a Part Classification Network (PCN) that has two streams. The first stream classifies each individual object part into image-level categories. The second stream concatenates part features and global feature into a joint feature for the final classification. In order to learn powerful part features and boost the joint feature capability, we propose a Duplex Focal Loss used for metric learning and part classification, which focuses on training hard examples. We further merge PLN and PCN into a unified network for an end-to-end training process via a simple training technique. Comprehensive experiments and comparisons with state-of-the-art methods on three benchmark datasets demonstrate the effectiveness of our proposed method.",

keywords = "Part localization network, duplex focal loss, fine-grained visual categorization, part classification network",

author = "Junwei Han and Xiwen Yao and Gong Cheng and Xiaoxu Feng and Dong Xu",

note = "Publisher Copyright: {\textcopyright} 1979-2012 IEEE.",

year = "2022",

month = feb,

day = "1",

doi = "10.1109/TPAMI.2019.2933510",

language = "英语",

volume = "44",

pages = "579--590",

journal = "IEEE Transactions on Pattern Analysis and Machine Intelligence",

issn = "0162-8828",

publisher = "IEEE Computer Society",

number = "2",

}

TY - JOUR

T1 - P-CNN

T2 - Part-Based Convolutional Neural Networks for Fine-Grained Visual Categorization

AU - Han, Junwei

AU - Yao, Xiwen

AU - Cheng, Gong

AU - Feng, Xiaoxu

AU - Xu, Dong

PY - 2022/2/1

Y1 - 2022/2/1

N2 - This paper proposes an end-to-end fine-grained visual categorization system, termed Part-based Convolutional Neural Network (P-CNN), which consists of three modules. The first module is a Squeeze-and-Excitation (SE) block, which learns to recalibrate channel-wise feature responses by emphasizing informative channels and suppressing less useful ones. The second module is a Part Localization Network (PLN) used to locate distinctive object parts, through which a bank of convolutional filters are learned as discriminative part detectors. Thus, a group of informative parts can be discovered by convolving the feature maps with each part detector. The third module is a Part Classification Network (PCN) that has two streams. The first stream classifies each individual object part into image-level categories. The second stream concatenates part features and global feature into a joint feature for the final classification. In order to learn powerful part features and boost the joint feature capability, we propose a Duplex Focal Loss used for metric learning and part classification, which focuses on training hard examples. We further merge PLN and PCN into a unified network for an end-to-end training process via a simple training technique. Comprehensive experiments and comparisons with state-of-the-art methods on three benchmark datasets demonstrate the effectiveness of our proposed method.

AB - This paper proposes an end-to-end fine-grained visual categorization system, termed Part-based Convolutional Neural Network (P-CNN), which consists of three modules. The first module is a Squeeze-and-Excitation (SE) block, which learns to recalibrate channel-wise feature responses by emphasizing informative channels and suppressing less useful ones. The second module is a Part Localization Network (PLN) used to locate distinctive object parts, through which a bank of convolutional filters are learned as discriminative part detectors. Thus, a group of informative parts can be discovered by convolving the feature maps with each part detector. The third module is a Part Classification Network (PCN) that has two streams. The first stream classifies each individual object part into image-level categories. The second stream concatenates part features and global feature into a joint feature for the final classification. In order to learn powerful part features and boost the joint feature capability, we propose a Duplex Focal Loss used for metric learning and part classification, which focuses on training hard examples. We further merge PLN and PCN into a unified network for an end-to-end training process via a simple training technique. Comprehensive experiments and comparisons with state-of-the-art methods on three benchmark datasets demonstrate the effectiveness of our proposed method.

KW - Part localization network

KW - duplex focal loss

KW - fine-grained visual categorization

KW - part classification network

UR - http://www.scopus.com/inward/record.url?scp=85122800249&partnerID=8YFLogxK

U2 - 10.1109/TPAMI.2019.2933510

DO - 10.1109/TPAMI.2019.2933510

M3 - 文章

C2 - 31398107

AN - SCOPUS:85122800249

SN - 0162-8828

VL - 44

SP - 579

EP - 590

JO - IEEE Transactions on Pattern Analysis and Machine Intelligence

JF - IEEE Transactions on Pattern Analysis and Machine Intelligence

IS - 2

ER -

P-CNN: Part-Based Convolutional Neural Networks for Fine-Grained Visual Categorization

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this