Heterogeneous visual features fusion via sparse multimodal machine

Hua Wang; Feiping Nie; Heng Huang; Chris Ding

doi:10.1109/CVPR.2013.398

Heterogeneous visual features fusion via sparse multimodal machine

Hua Wang, Feiping Nie, Heng Huang, Chris Ding

科研成果: 期刊稿件 › 会议文章 › 同行评审

84 引用（Scopus）

摘要

To better understand, search, and classify image and video information, many visual feature descriptors have been proposed to describe elementary visual characteristics, such as the shape, the color, the texture, etc. How to integrate these heterogeneous visual features and identify the important ones from them for specific vision tasks has become an increasingly critical problem. In this paper, We propose a novel Sparse Multimodal Learning (SMML) approach to integrate such heterogeneous features by using the joint structured sparsity regularizations to learn the feature importance of for the vision tasks from both group-wise and individual point of views. A new optimization algorithm is also introduced to solve the non-smooth objective with rigorously proved global convergence. We applied our SMML method to five broadly used object categorization and scene understanding image data sets for both single-label and multi-label image classification tasks. For each data set we integrate six different types of popularly used image features. Compared to existing scene and object categorization methods using either single modality or multi-modalities of features, our approach always achieves better performances measured.

源语言	英语
文章编号	6619242
页（从-至）	3097-3102
页数	6
期刊	Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
DOI	https://doi.org/10.1109/CVPR.2013.398
出版状态	已出版 - 2013
已对外发布	是
活动	26th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2013 - Portland, OR, 美国期限: 23 6月 2013 → 28 6月 2013

访问文件

10.1109/CVPR.2013.398

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{af2c69644d8749bb93ce5417cb676183,

title = "Heterogeneous visual features fusion via sparse multimodal machine",

abstract = "To better understand, search, and classify image and video information, many visual feature descriptors have been proposed to describe elementary visual characteristics, such as the shape, the color, the texture, etc. How to integrate these heterogeneous visual features and identify the important ones from them for specific vision tasks has become an increasingly critical problem. In this paper, We propose a novel Sparse Multimodal Learning (SMML) approach to integrate such heterogeneous features by using the joint structured sparsity regularizations to learn the feature importance of for the vision tasks from both group-wise and individual point of views. A new optimization algorithm is also introduced to solve the non-smooth objective with rigorously proved global convergence. We applied our SMML method to five broadly used object categorization and scene understanding image data sets for both single-label and multi-label image classification tasks. For each data set we integrate six different types of popularly used image features. Compared to existing scene and object categorization methods using either single modality or multi-modalities of features, our approach always achieves better performances measured.",

keywords = "Data Integration, Structured Sparsity, Visual Features Fusion",

author = "Hua Wang and Feiping Nie and Heng Huang and Chris Ding",

year = "2013",

doi = "10.1109/CVPR.2013.398",

language = "英语",

pages = "3097--3102",

journal = "Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition",

issn = "1063-6919",

publisher = "IEEE Computer Society",

note = "26th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2013 ; Conference date: 23-06-2013 Through 28-06-2013",

}

TY - JOUR

T1 - Heterogeneous visual features fusion via sparse multimodal machine

AU - Wang, Hua

AU - Nie, Feiping

AU - Huang, Heng

AU - Ding, Chris

PY - 2013

Y1 - 2013

N2 - To better understand, search, and classify image and video information, many visual feature descriptors have been proposed to describe elementary visual characteristics, such as the shape, the color, the texture, etc. How to integrate these heterogeneous visual features and identify the important ones from them for specific vision tasks has become an increasingly critical problem. In this paper, We propose a novel Sparse Multimodal Learning (SMML) approach to integrate such heterogeneous features by using the joint structured sparsity regularizations to learn the feature importance of for the vision tasks from both group-wise and individual point of views. A new optimization algorithm is also introduced to solve the non-smooth objective with rigorously proved global convergence. We applied our SMML method to five broadly used object categorization and scene understanding image data sets for both single-label and multi-label image classification tasks. For each data set we integrate six different types of popularly used image features. Compared to existing scene and object categorization methods using either single modality or multi-modalities of features, our approach always achieves better performances measured.

AB - To better understand, search, and classify image and video information, many visual feature descriptors have been proposed to describe elementary visual characteristics, such as the shape, the color, the texture, etc. How to integrate these heterogeneous visual features and identify the important ones from them for specific vision tasks has become an increasingly critical problem. In this paper, We propose a novel Sparse Multimodal Learning (SMML) approach to integrate such heterogeneous features by using the joint structured sparsity regularizations to learn the feature importance of for the vision tasks from both group-wise and individual point of views. A new optimization algorithm is also introduced to solve the non-smooth objective with rigorously proved global convergence. We applied our SMML method to five broadly used object categorization and scene understanding image data sets for both single-label and multi-label image classification tasks. For each data set we integrate six different types of popularly used image features. Compared to existing scene and object categorization methods using either single modality or multi-modalities of features, our approach always achieves better performances measured.

KW - Data Integration

KW - Structured Sparsity

KW - Visual Features Fusion

UR - http://www.scopus.com/inward/record.url?scp=84887363909&partnerID=8YFLogxK

U2 - 10.1109/CVPR.2013.398

DO - 10.1109/CVPR.2013.398

M3 - 会议文章

AN - SCOPUS:84887363909

SN - 1063-6919

SP - 3097

EP - 3102

JO - Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition

JF - Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition

M1 - 6619242

T2 - 26th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2013

Y2 - 23 June 2013 through 28 June 2013

ER -

Heterogeneous visual features fusion via sparse multimodal machine

摘要

访问文件

其它文件与链接

指纹

引用此