Abstract
This paper proposes an end-to-end fine-grained visual categorization system, termed Part-based Convolutional Neural Network (P-CNN), which consists of three modules. The first module is a Squeeze-and-Excitation (SE) block, which learns to recalibrate channel-wise feature responses by emphasizing informative channels and suppressing less useful ones. The second module is a Part Localization Network (PLN) used to locate distinctive object parts, through which a bank of convolutional filters are learned as discriminative part detectors. Thus, a group of informative parts can be discovered by convolving the feature maps with each part detector. The third module is a Part Classification Network (PCN) that has two streams. The first stream classifies each individual object part into image-level categories. The second stream concatenates part features and global feature into a joint feature for the final classification. In order to learn powerful part features and boost the joint feature capability, we propose a Duplex Focal Loss used for metric learning and part classification, which focuses on training hard examples. We further merge PLN and PCN into a unified network for an end-to-end training process via a simple training technique. Comprehensive experiments and comparisons with state-of-the-art methods on three benchmark datasets demonstrate the effectiveness of our proposed method.
| Original language | English |
|---|---|
| Pages (from-to) | 579-590 |
| Number of pages | 12 |
| Journal | IEEE Transactions on Pattern Analysis and Machine Intelligence |
| Volume | 44 |
| Issue number | 2 |
| DOIs | |
| State | Published - 1 Feb 2022 |
Keywords
- Part localization network
- duplex focal loss
- fine-grained visual categorization
- part classification network
Fingerprint
Dive into the research topics of 'P-CNN: Part-Based Convolutional Neural Networks for Fine-Grained Visual Categorization'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver