Learning Multi-instance Enriched Image Representations via Non-greedy Ratio Maximization of the ℓ1-Norm Distances

Kai Liu; Hua Wang; Feiping Nie; Hao Zhang

doi:10.1109/CVPR.2018.00806

Learning Multi-instance Enriched Image Representations via Non-greedy Ratio Maximization of the ℓ₁-Norm Distances

Kai Liu, Hua Wang, Feiping Nie, Hao Zhang

光电与智能研究院

Colorado School of Mines

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

19 引用（Scopus）

摘要

Multi-instance learning (MIL) has demonstrated its usefulness in many real-world image applications in recent years. However, two critical challenges prevent one from effectively using MIL in practice. First, existing MIL methods routinely model the predictive targets using the instances of input images, but rarely utilize an input image as a whole. As a result, the useful information conveyed by the holistic representation of an input image could be potentially lost. Second, the varied numbers of the instances of the input images in a data set make it infeasible to use traditional learning models that can only deal with single-vector inputs. To tackle these two challenges, in this paper we propose a novel image representation learning method that can integrate the local patches (the instances) of an input image (the bag) and its holistic representation into one single-vector representation. Our new method first learns a projection to preserve both global and local consistencies of the instances of an input image. It then projects the holistic representation of the same image into the learned subspace for information enrichment. Taking into account the content and characterization variations in natural scenes and photos, we develop an objective that maximizes the ratio of the summations of a number of l1-norm distances, which is difficult to solve in general. To solve our objective, we derive a new efficient non-greedy iterative algorithm and rigorously prove its convergence. Promising results in extensive experiments have demonstrated improved performances of our new method that validate its effectiveness.

源语言	英语
主期刊名	Proceedings - 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2018
出版商	IEEE Computer Society
页	7727-7735
页数	9
ISBN（电子版）	9781538664209
DOI	https://doi.org/10.1109/CVPR.2018.00806
出版状态	已出版 - 14 12月 2018
活动	31st Meeting of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2018 - Salt Lake City, 美国期限: 18 6月 2018 → 22 6月 2018

出版系列

姓名	Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
ISSN（印刷版）	1063-6919

会议

会议	31st Meeting of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2018
国家/地区	美国
市	Salt Lake City
时期	18/06/18 → 22/06/18

访问文件

10.1109/CVPR.2018.00806

其它文件与链接

链接到 Scopus 的出版物

引用此

Liu, K., Wang, H., Nie, F., & Zhang, H. (2018). Learning Multi-instance Enriched Image Representations via Non-greedy Ratio Maximization of the ℓ₁-Norm Distances. 在 Proceedings - 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2018 (页码 7727-7735). 文章 8578904 (Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition). IEEE Computer Society. https://doi.org/10.1109/CVPR.2018.00806

Liu, Kai ; Wang, Hua ; Nie, Feiping 等. / Learning Multi-instance Enriched Image Representations via Non-greedy Ratio Maximization of the ℓ₁-Norm Distances. Proceedings - 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2018. IEEE Computer Society, 2018. 页码 7727-7735 (Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition).

@inproceedings{fe5f597aa07d4896969bfec32f1343fc,

title = "Learning Multi-instance Enriched Image Representations via Non-greedy Ratio Maximization of the ℓ1-Norm Distances",

abstract = "Multi-instance learning (MIL) has demonstrated its usefulness in many real-world image applications in recent years. However, two critical challenges prevent one from effectively using MIL in practice. First, existing MIL methods routinely model the predictive targets using the instances of input images, but rarely utilize an input image as a whole. As a result, the useful information conveyed by the holistic representation of an input image could be potentially lost. Second, the varied numbers of the instances of the input images in a data set make it infeasible to use traditional learning models that can only deal with single-vector inputs. To tackle these two challenges, in this paper we propose a novel image representation learning method that can integrate the local patches (the instances) of an input image (the bag) and its holistic representation into one single-vector representation. Our new method first learns a projection to preserve both global and local consistencies of the instances of an input image. It then projects the holistic representation of the same image into the learned subspace for information enrichment. Taking into account the content and characterization variations in natural scenes and photos, we develop an objective that maximizes the ratio of the summations of a number of l1-norm distances, which is difficult to solve in general. To solve our objective, we derive a new efficient non-greedy iterative algorithm and rigorously prove its convergence. Promising results in extensive experiments have demonstrated improved performances of our new method that validate its effectiveness.",

author = "Kai Liu and Hua Wang and Feiping Nie and Hao Zhang",

note = "Publisher Copyright: {\textcopyright} 2018 IEEE.; 31st Meeting of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2018 ; Conference date: 18-06-2018 Through 22-06-2018",

year = "2018",

month = dec,

day = "14",

doi = "10.1109/CVPR.2018.00806",

language = "英语",

series = "Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition",

publisher = "IEEE Computer Society",

pages = "7727--7735",

booktitle = "Proceedings - 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2018",

}

Liu, K, Wang, H, Nie, F & Zhang, H 2018, Learning Multi-instance Enriched Image Representations via Non-greedy Ratio Maximization of the ℓ₁-Norm Distances. 在 Proceedings - 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2018., 8578904, Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, IEEE Computer Society, 页码 7727-7735, 31st Meeting of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, 美国, 18/06/18. https://doi.org/10.1109/CVPR.2018.00806

Learning Multi-instance Enriched Image Representations via Non-greedy Ratio Maximization of the ℓ₁-Norm Distances. / Liu, Kai; Wang, Hua; Nie, Feiping 等.
Proceedings - 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2018. IEEE Computer Society, 2018. 页码 7727-7735 8578904 (Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition).

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

TY - GEN

T1 - Learning Multi-instance Enriched Image Representations via Non-greedy Ratio Maximization of the ℓ1-Norm Distances

AU - Liu, Kai

AU - Wang, Hua

AU - Nie, Feiping

AU - Zhang, Hao

PY - 2018/12/14

Y1 - 2018/12/14

N2 - Multi-instance learning (MIL) has demonstrated its usefulness in many real-world image applications in recent years. However, two critical challenges prevent one from effectively using MIL in practice. First, existing MIL methods routinely model the predictive targets using the instances of input images, but rarely utilize an input image as a whole. As a result, the useful information conveyed by the holistic representation of an input image could be potentially lost. Second, the varied numbers of the instances of the input images in a data set make it infeasible to use traditional learning models that can only deal with single-vector inputs. To tackle these two challenges, in this paper we propose a novel image representation learning method that can integrate the local patches (the instances) of an input image (the bag) and its holistic representation into one single-vector representation. Our new method first learns a projection to preserve both global and local consistencies of the instances of an input image. It then projects the holistic representation of the same image into the learned subspace for information enrichment. Taking into account the content and characterization variations in natural scenes and photos, we develop an objective that maximizes the ratio of the summations of a number of l1-norm distances, which is difficult to solve in general. To solve our objective, we derive a new efficient non-greedy iterative algorithm and rigorously prove its convergence. Promising results in extensive experiments have demonstrated improved performances of our new method that validate its effectiveness.

AB - Multi-instance learning (MIL) has demonstrated its usefulness in many real-world image applications in recent years. However, two critical challenges prevent one from effectively using MIL in practice. First, existing MIL methods routinely model the predictive targets using the instances of input images, but rarely utilize an input image as a whole. As a result, the useful information conveyed by the holistic representation of an input image could be potentially lost. Second, the varied numbers of the instances of the input images in a data set make it infeasible to use traditional learning models that can only deal with single-vector inputs. To tackle these two challenges, in this paper we propose a novel image representation learning method that can integrate the local patches (the instances) of an input image (the bag) and its holistic representation into one single-vector representation. Our new method first learns a projection to preserve both global and local consistencies of the instances of an input image. It then projects the holistic representation of the same image into the learned subspace for information enrichment. Taking into account the content and characterization variations in natural scenes and photos, we develop an objective that maximizes the ratio of the summations of a number of l1-norm distances, which is difficult to solve in general. To solve our objective, we derive a new efficient non-greedy iterative algorithm and rigorously prove its convergence. Promising results in extensive experiments have demonstrated improved performances of our new method that validate its effectiveness.

UR - http://www.scopus.com/inward/record.url?scp=85062884304&partnerID=8YFLogxK

U2 - 10.1109/CVPR.2018.00806

DO - 10.1109/CVPR.2018.00806

M3 - 会议稿件

AN - SCOPUS:85062884304

T3 - Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition

SP - 7727

EP - 7735

BT - Proceedings - 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2018

PB - IEEE Computer Society

T2 - 31st Meeting of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2018

Y2 - 18 June 2018 through 22 June 2018

ER -

Liu K, Wang H, Nie F, Zhang H. Learning Multi-instance Enriched Image Representations via Non-greedy Ratio Maximization of the ℓ₁-Norm Distances. 在 Proceedings - 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2018. IEEE Computer Society. 2018. 页码 7727-7735. 8578904. (Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition). doi: 10.1109/CVPR.2018.00806

Learning Multi-instance Enriched Image Representations via Non-greedy Ratio Maximization of the ℓ1-Norm Distances

摘要

出版系列

会议

访问文件

其它文件与链接

指纹

引用此

Learning Multi-instance Enriched Image Representations via Non-greedy Ratio Maximization of the ℓ₁-Norm Distances