Coarse-to-Fine Nutrition Prediction

Binglu Wang; Tianci Bu; Zaiyi Hu; Le Yang; Yongqiang Zhao; Xuelong Li

doi:10.1109/TMM.2023.3313638

Coarse-to-Fine Nutrition Prediction

Binglu Wang, Tianci Bu, Zaiyi Hu, Le Yang, Yongqiang Zhao, Xuelong Li

自动化学院

科研成果: 期刊稿件 › 文章 › 同行评审

3 引用（Scopus）

摘要

Healthy dietary intake has a broad influence on the quality of life, and nutrition prediction plays a great role in the auxiliary decision-making of diet. Given a food image, existing nutrition prediction methods directly regress the nutrition content. However, due to the complex variations in food images, such as differences in viewpoint and lighting conditions, directly regressing the nutrition content faces significant challenges. The complexity of the food image data results in a high-dimensional and feature-rich input space, which poses difficulties for traditional regression models to efficiently navigate and optimize. Consequently, the direct regression paradigm usually generates inaccurate nutrition predictions. To alleviate the ambiguity challenge in the prediction progress, we propose to narrow the searchable space for the model's predictions by decomposing the direct regression into two steps: first coarsely selecting the nutrition scope and then finely refining the prediction value, forming a coarse-to-fine nutrition prediction paradigm. Although the process of coarse prediction which selects a bin from a series of scope bins can be formulated as a standard classification problem, it exhibits a distinguishable characteristic, i.e. the closer to the ground truth bin, the less punishment in the training phase. However, most of the current methods have ignored this phenomenon, thus, we specially design the linearly smoothed label in the nutrition prediction task to reveal the relative distance to the ground truth bin, leading to extraordinary improvements. Furthermore, we conduct a pair-wise comparison among all bins by extending the 1D label into 2D space and propose the structure loss to guide the bin selection process effectively. Due to the narrowed decision space, the nutrition prediction problem can be effectively optimized, and the proposed method achieves promising results on three benchmarks ECUSTFD, VFD and Nutrition5K, demonstrating the efficiency of the coarse-to-fine paradigm equipped with the linear-smoothed structure loss.

源语言	英语
页（从-至）	3651-3662
页数	12
期刊	IEEE Transactions on Multimedia
卷	26
DOI	https://doi.org/10.1109/TMM.2023.3313638
出版状态	已出版 - 2024

访问文件

10.1109/TMM.2023.3313638

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{ef215d103e304f3f90d9962fe7584653,

title = "Coarse-to-Fine Nutrition Prediction",

abstract = "Healthy dietary intake has a broad influence on the quality of life, and nutrition prediction plays a great role in the auxiliary decision-making of diet. Given a food image, existing nutrition prediction methods directly regress the nutrition content. However, due to the complex variations in food images, such as differences in viewpoint and lighting conditions, directly regressing the nutrition content faces significant challenges. The complexity of the food image data results in a high-dimensional and feature-rich input space, which poses difficulties for traditional regression models to efficiently navigate and optimize. Consequently, the direct regression paradigm usually generates inaccurate nutrition predictions. To alleviate the ambiguity challenge in the prediction progress, we propose to narrow the searchable space for the model's predictions by decomposing the direct regression into two steps: first coarsely selecting the nutrition scope and then finely refining the prediction value, forming a coarse-to-fine nutrition prediction paradigm. Although the process of coarse prediction which selects a bin from a series of scope bins can be formulated as a standard classification problem, it exhibits a distinguishable characteristic, i.e. the closer to the ground truth bin, the less punishment in the training phase. However, most of the current methods have ignored this phenomenon, thus, we specially design the linearly smoothed label in the nutrition prediction task to reveal the relative distance to the ground truth bin, leading to extraordinary improvements. Furthermore, we conduct a pair-wise comparison among all bins by extending the 1D label into 2D space and propose the structure loss to guide the bin selection process effectively. Due to the narrowed decision space, the nutrition prediction problem can be effectively optimized, and the proposed method achieves promising results on three benchmarks ECUSTFD, VFD and Nutrition5K, demonstrating the efficiency of the coarse-to-fine paradigm equipped with the linear-smoothed structure loss.",

keywords = "Nutrition prediction, coarse-to-fine, label smoothing, multi-task learning",

author = "Binglu Wang and Tianci Bu and Zaiyi Hu and Le Yang and Yongqiang Zhao and Xuelong Li",

note = "Publisher Copyright: {\textcopyright} 1999-2012 IEEE.",

year = "2024",

doi = "10.1109/TMM.2023.3313638",

language = "英语",

volume = "26",

pages = "3651--3662",

journal = "IEEE Transactions on Multimedia",

issn = "1520-9210",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - JOUR

T1 - Coarse-to-Fine Nutrition Prediction

AU - Wang, Binglu

AU - Bu, Tianci

AU - Hu, Zaiyi

AU - Yang, Le

AU - Zhao, Yongqiang

AU - Li, Xuelong

PY - 2024

Y1 - 2024

N2 - Healthy dietary intake has a broad influence on the quality of life, and nutrition prediction plays a great role in the auxiliary decision-making of diet. Given a food image, existing nutrition prediction methods directly regress the nutrition content. However, due to the complex variations in food images, such as differences in viewpoint and lighting conditions, directly regressing the nutrition content faces significant challenges. The complexity of the food image data results in a high-dimensional and feature-rich input space, which poses difficulties for traditional regression models to efficiently navigate and optimize. Consequently, the direct regression paradigm usually generates inaccurate nutrition predictions. To alleviate the ambiguity challenge in the prediction progress, we propose to narrow the searchable space for the model's predictions by decomposing the direct regression into two steps: first coarsely selecting the nutrition scope and then finely refining the prediction value, forming a coarse-to-fine nutrition prediction paradigm. Although the process of coarse prediction which selects a bin from a series of scope bins can be formulated as a standard classification problem, it exhibits a distinguishable characteristic, i.e. the closer to the ground truth bin, the less punishment in the training phase. However, most of the current methods have ignored this phenomenon, thus, we specially design the linearly smoothed label in the nutrition prediction task to reveal the relative distance to the ground truth bin, leading to extraordinary improvements. Furthermore, we conduct a pair-wise comparison among all bins by extending the 1D label into 2D space and propose the structure loss to guide the bin selection process effectively. Due to the narrowed decision space, the nutrition prediction problem can be effectively optimized, and the proposed method achieves promising results on three benchmarks ECUSTFD, VFD and Nutrition5K, demonstrating the efficiency of the coarse-to-fine paradigm equipped with the linear-smoothed structure loss.

AB - Healthy dietary intake has a broad influence on the quality of life, and nutrition prediction plays a great role in the auxiliary decision-making of diet. Given a food image, existing nutrition prediction methods directly regress the nutrition content. However, due to the complex variations in food images, such as differences in viewpoint and lighting conditions, directly regressing the nutrition content faces significant challenges. The complexity of the food image data results in a high-dimensional and feature-rich input space, which poses difficulties for traditional regression models to efficiently navigate and optimize. Consequently, the direct regression paradigm usually generates inaccurate nutrition predictions. To alleviate the ambiguity challenge in the prediction progress, we propose to narrow the searchable space for the model's predictions by decomposing the direct regression into two steps: first coarsely selecting the nutrition scope and then finely refining the prediction value, forming a coarse-to-fine nutrition prediction paradigm. Although the process of coarse prediction which selects a bin from a series of scope bins can be formulated as a standard classification problem, it exhibits a distinguishable characteristic, i.e. the closer to the ground truth bin, the less punishment in the training phase. However, most of the current methods have ignored this phenomenon, thus, we specially design the linearly smoothed label in the nutrition prediction task to reveal the relative distance to the ground truth bin, leading to extraordinary improvements. Furthermore, we conduct a pair-wise comparison among all bins by extending the 1D label into 2D space and propose the structure loss to guide the bin selection process effectively. Due to the narrowed decision space, the nutrition prediction problem can be effectively optimized, and the proposed method achieves promising results on three benchmarks ECUSTFD, VFD and Nutrition5K, demonstrating the efficiency of the coarse-to-fine paradigm equipped with the linear-smoothed structure loss.

KW - Nutrition prediction

KW - coarse-to-fine

KW - label smoothing

KW - multi-task learning

UR - http://www.scopus.com/inward/record.url?scp=85171770298&partnerID=8YFLogxK

U2 - 10.1109/TMM.2023.3313638

DO - 10.1109/TMM.2023.3313638

M3 - 文章

AN - SCOPUS:85171770298

SN - 1520-9210

VL - 26

SP - 3651

EP - 3662

JO - IEEE Transactions on Multimedia

JF - IEEE Transactions on Multimedia

ER -

Coarse-to-Fine Nutrition Prediction

摘要

访问文件

其它文件与链接

指纹

引用此