Where to Look and How to Describe: Fashion Image Retrieval with an Attentional Heterogeneous Bilinear Network

Haibo Su, Peng Wang, Lingqiao Liu, Hui Li, Zhen Li, Yanning Zhang

Research output: Contribution to journalArticlepeer-review

21 Scopus citations

Abstract

Fashion products typically feature in compositions of a variety of styles at different clothing parts. In order to distinguish images of different fashion products, we need to extract both appearance (i.e., 'how to describe') and localization (i.e., 'where to look') information, and their interactions. To this end, we propose a biologically inspired framework for image-based fashion product retrieval, which mimics the hypothesized two-stream visual processing system of human brain. The proposed attentional heterogeneous bilinear network (AHBN) consists of two branches: a deep CNN branch to extract fine-grained appearance attributes and a fully convolutional branch to extract landmark localization information. A joint channel-wise attention mechanism is further applied to the extracted heterogeneous features to focus on important channels, followed by a compact bilinear pooling layer to model the interaction of the two streams. Our proposed framework achieves satisfactory performance on three image-based fashion product retrieval benchmarks.

Original languageEnglish
Article number9245584
Pages (from-to)3254-3265
Number of pages12
JournalIEEE Transactions on Circuits and Systems for Video Technology
Volume31
Issue number8
DOIs
StatePublished - Aug 2021
Externally publishedYes

Keywords

  • attention
  • bilinear pooling
  • Fashion retrieval

Fingerprint

Dive into the research topics of 'Where to Look and How to Describe: Fashion Image Retrieval with an Attentional Heterogeneous Bilinear Network'. Together they form a unique fingerprint.

Cite this