Where to Look and How to Describe: Fashion Image Retrieval with an Attentional Heterogeneous Bilinear Network

Haibo Su; Peng Wang; Lingqiao Liu; Hui Li; Zhen Li; Yanning Zhang

doi:10.1109/TCSVT.2020.3034981

Where to Look and How to Describe: Fashion Image Retrieval with an Attentional Heterogeneous Bilinear Network

Haibo Su, Peng Wang, Lingqiao Liu, Hui Li, Zhen Li, Yanning Zhang

Research output: Contribution to journal › Article › peer-review

21 Scopus citations

Abstract

Fashion products typically feature in compositions of a variety of styles at different clothing parts. In order to distinguish images of different fashion products, we need to extract both appearance (i.e., 'how to describe') and localization (i.e., 'where to look') information, and their interactions. To this end, we propose a biologically inspired framework for image-based fashion product retrieval, which mimics the hypothesized two-stream visual processing system of human brain. The proposed attentional heterogeneous bilinear network (AHBN) consists of two branches: a deep CNN branch to extract fine-grained appearance attributes and a fully convolutional branch to extract landmark localization information. A joint channel-wise attention mechanism is further applied to the extracted heterogeneous features to focus on important channels, followed by a compact bilinear pooling layer to model the interaction of the two streams. Our proposed framework achieves satisfactory performance on three image-based fashion product retrieval benchmarks.

Original language	English
Article number	9245584
Pages (from-to)	3254-3265
Number of pages	12
Journal	IEEE Transactions on Circuits and Systems for Video Technology
Volume	31
Issue number	8
DOIs	https://doi.org/10.1109/TCSVT.2020.3034981
State	Published - Aug 2021
Externally published	Yes

Keywords

attention
bilinear pooling
Fashion retrieval

Access to Document

10.1109/TCSVT.2020.3034981

Cite this

@article{2f402ee311974a87aa63d06013f5094e,

title = "Where to Look and How to Describe: Fashion Image Retrieval with an Attentional Heterogeneous Bilinear Network",

abstract = "Fashion products typically feature in compositions of a variety of styles at different clothing parts. In order to distinguish images of different fashion products, we need to extract both appearance (i.e., 'how to describe') and localization (i.e., 'where to look') information, and their interactions. To this end, we propose a biologically inspired framework for image-based fashion product retrieval, which mimics the hypothesized two-stream visual processing system of human brain. The proposed attentional heterogeneous bilinear network (AHBN) consists of two branches: a deep CNN branch to extract fine-grained appearance attributes and a fully convolutional branch to extract landmark localization information. A joint channel-wise attention mechanism is further applied to the extracted heterogeneous features to focus on important channels, followed by a compact bilinear pooling layer to model the interaction of the two streams. Our proposed framework achieves satisfactory performance on three image-based fashion product retrieval benchmarks.",

keywords = "attention, bilinear pooling, Fashion retrieval",

author = "Haibo Su and Peng Wang and Lingqiao Liu and Hui Li and Zhen Li and Yanning Zhang",

note = "Publisher Copyright: {\textcopyright} 1991-2012 IEEE.",

year = "2021",

month = aug,

doi = "10.1109/TCSVT.2020.3034981",

language = "英语",

volume = "31",

pages = "3254--3265",

journal = "IEEE Transactions on Circuits and Systems for Video Technology",

issn = "1051-8215",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

number = "8",

}

TY - JOUR

T1 - Where to Look and How to Describe

T2 - Fashion Image Retrieval with an Attentional Heterogeneous Bilinear Network

AU - Su, Haibo

AU - Wang, Peng

AU - Liu, Lingqiao

AU - Li, Hui

AU - Li, Zhen

AU - Zhang, Yanning

PY - 2021/8

Y1 - 2021/8

N2 - Fashion products typically feature in compositions of a variety of styles at different clothing parts. In order to distinguish images of different fashion products, we need to extract both appearance (i.e., 'how to describe') and localization (i.e., 'where to look') information, and their interactions. To this end, we propose a biologically inspired framework for image-based fashion product retrieval, which mimics the hypothesized two-stream visual processing system of human brain. The proposed attentional heterogeneous bilinear network (AHBN) consists of two branches: a deep CNN branch to extract fine-grained appearance attributes and a fully convolutional branch to extract landmark localization information. A joint channel-wise attention mechanism is further applied to the extracted heterogeneous features to focus on important channels, followed by a compact bilinear pooling layer to model the interaction of the two streams. Our proposed framework achieves satisfactory performance on three image-based fashion product retrieval benchmarks.

AB - Fashion products typically feature in compositions of a variety of styles at different clothing parts. In order to distinguish images of different fashion products, we need to extract both appearance (i.e., 'how to describe') and localization (i.e., 'where to look') information, and their interactions. To this end, we propose a biologically inspired framework for image-based fashion product retrieval, which mimics the hypothesized two-stream visual processing system of human brain. The proposed attentional heterogeneous bilinear network (AHBN) consists of two branches: a deep CNN branch to extract fine-grained appearance attributes and a fully convolutional branch to extract landmark localization information. A joint channel-wise attention mechanism is further applied to the extracted heterogeneous features to focus on important channels, followed by a compact bilinear pooling layer to model the interaction of the two streams. Our proposed framework achieves satisfactory performance on three image-based fashion product retrieval benchmarks.

KW - attention

KW - bilinear pooling

KW - Fashion retrieval

UR - http://www.scopus.com/inward/record.url?scp=85112703076&partnerID=8YFLogxK

U2 - 10.1109/TCSVT.2020.3034981

DO - 10.1109/TCSVT.2020.3034981

M3 - 文章

AN - SCOPUS:85112703076

SN - 1051-8215

VL - 31

SP - 3254

EP - 3265

JO - IEEE Transactions on Circuits and Systems for Video Technology

JF - IEEE Transactions on Circuits and Systems for Video Technology

IS - 8

M1 - 9245584

ER -

Where to Look and How to Describe: Fashion Image Retrieval with an Attentional Heterogeneous Bilinear Network

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this