PneumoLLM: Harnessing the power of large language model for pneumoconiosis diagnosis

Meiyue Song; Jiarui Wang; Zhihua Yu; Jiaxin Wang; Le Yang; Yuting Lu; Baicun Li; Xue Wang; Xiaoxu Wang; Qinghua Huang; Zhijun Li; Nikolaos I. Kanellakis; Jiangfeng Liu; Jing Wang; Binglu Wang; Juntao Yang

doi:10.1016/j.media.2024.103248

PneumoLLM: Harnessing the power of large language model for pneumoconiosis diagnosis

Meiyue Song, Jiarui Wang, Zhihua Yu, Jiaxin Wang, Le Yang, Yuting Lu, Baicun Li, Xue Wang, Xiaoxu Wang, Qinghua Huang, Zhijun Li, Nikolaos I. Kanellakis, Jiangfeng Liu, Jing Wang, Binglu Wang, Juntao Yang

科研成果: 期刊稿件 › 文章 › 同行评审

8 引用（Scopus）

摘要

The conventional pretraining-and-finetuning paradigm, while effective for common diseases with ample data, faces challenges in diagnosing data-scarce occupational diseases like pneumoconiosis. Recently, large language models (LLMs) have exhibits unprecedented ability when conducting multiple tasks in dialogue, bringing opportunities to diagnosis. A common strategy might involve using adapter layers for vision–language alignment and diagnosis in a dialogic manner. Yet, this approach often requires optimization of extensive learnable parameters in the text branch and the dialogue head, potentially diminishing the LLMs’ efficacy, especially with limited training data. In our work, we innovate by eliminating the text branch and substituting the dialogue head with a classification head. This approach presents a more effective method for harnessing LLMs in diagnosis with fewer learnable parameters. Furthermore, to balance the retention of detailed image information with progression towards accurate diagnosis, we introduce the contextual multi-token engine. This engine is specialized in adaptively generating diagnostic tokens. Additionally, we propose the information emitter module, which unidirectionally emits information from image tokens to diagnosis tokens. Comprehensive experiments validate the superiority of our methods.

源语言	英语
文章编号	103248
期刊	Medical Image Analysis
卷	97
DOI	https://doi.org/10.1016/j.media.2024.103248
出版状态	已出版 - 10月 2024

访问文件

10.1016/j.media.2024.103248

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{9eefc13f6a564cf496f4afd1f2fa9acc,

title = "PneumoLLM: Harnessing the power of large language model for pneumoconiosis diagnosis",

abstract = "The conventional pretraining-and-finetuning paradigm, while effective for common diseases with ample data, faces challenges in diagnosing data-scarce occupational diseases like pneumoconiosis. Recently, large language models (LLMs) have exhibits unprecedented ability when conducting multiple tasks in dialogue, bringing opportunities to diagnosis. A common strategy might involve using adapter layers for vision–language alignment and diagnosis in a dialogic manner. Yet, this approach often requires optimization of extensive learnable parameters in the text branch and the dialogue head, potentially diminishing the LLMs{\textquoteright} efficacy, especially with limited training data. In our work, we innovate by eliminating the text branch and substituting the dialogue head with a classification head. This approach presents a more effective method for harnessing LLMs in diagnosis with fewer learnable parameters. Furthermore, to balance the retention of detailed image information with progression towards accurate diagnosis, we introduce the contextual multi-token engine. This engine is specialized in adaptively generating diagnostic tokens. Additionally, we propose the information emitter module, which unidirectionally emits information from image tokens to diagnosis tokens. Comprehensive experiments validate the superiority of our methods.",

keywords = "Foundational model, Large language model, Medical image diagnosis",

author = "Meiyue Song and Jiarui Wang and Zhihua Yu and Jiaxin Wang and Le Yang and Yuting Lu and Baicun Li and Xue Wang and Xiaoxu Wang and Qinghua Huang and Zhijun Li and Kanellakis, {Nikolaos I.} and Jiangfeng Liu and Jing Wang and Binglu Wang and Juntao Yang",

note = "Publisher Copyright: {\textcopyright} 2024 Elsevier B.V.",

year = "2024",

month = oct,

doi = "10.1016/j.media.2024.103248",

language = "英语",

volume = "97",

journal = "Medical Image Analysis",

issn = "1361-8415",

publisher = "Elsevier B.V.",

}

TY - JOUR

T1 - PneumoLLM

T2 - Harnessing the power of large language model for pneumoconiosis diagnosis

AU - Song, Meiyue

AU - Wang, Jiarui

AU - Yu, Zhihua

AU - Wang, Jiaxin

AU - Yang, Le

AU - Lu, Yuting

AU - Li, Baicun

AU - Wang, Xue

AU - Wang, Xiaoxu

AU - Huang, Qinghua

AU - Li, Zhijun

AU - Kanellakis, Nikolaos I.

AU - Liu, Jiangfeng

AU - Wang, Jing

AU - Wang, Binglu

AU - Yang, Juntao

PY - 2024/10

Y1 - 2024/10

N2 - The conventional pretraining-and-finetuning paradigm, while effective for common diseases with ample data, faces challenges in diagnosing data-scarce occupational diseases like pneumoconiosis. Recently, large language models (LLMs) have exhibits unprecedented ability when conducting multiple tasks in dialogue, bringing opportunities to diagnosis. A common strategy might involve using adapter layers for vision–language alignment and diagnosis in a dialogic manner. Yet, this approach often requires optimization of extensive learnable parameters in the text branch and the dialogue head, potentially diminishing the LLMs’ efficacy, especially with limited training data. In our work, we innovate by eliminating the text branch and substituting the dialogue head with a classification head. This approach presents a more effective method for harnessing LLMs in diagnosis with fewer learnable parameters. Furthermore, to balance the retention of detailed image information with progression towards accurate diagnosis, we introduce the contextual multi-token engine. This engine is specialized in adaptively generating diagnostic tokens. Additionally, we propose the information emitter module, which unidirectionally emits information from image tokens to diagnosis tokens. Comprehensive experiments validate the superiority of our methods.

AB - The conventional pretraining-and-finetuning paradigm, while effective for common diseases with ample data, faces challenges in diagnosing data-scarce occupational diseases like pneumoconiosis. Recently, large language models (LLMs) have exhibits unprecedented ability when conducting multiple tasks in dialogue, bringing opportunities to diagnosis. A common strategy might involve using adapter layers for vision–language alignment and diagnosis in a dialogic manner. Yet, this approach often requires optimization of extensive learnable parameters in the text branch and the dialogue head, potentially diminishing the LLMs’ efficacy, especially with limited training data. In our work, we innovate by eliminating the text branch and substituting the dialogue head with a classification head. This approach presents a more effective method for harnessing LLMs in diagnosis with fewer learnable parameters. Furthermore, to balance the retention of detailed image information with progression towards accurate diagnosis, we introduce the contextual multi-token engine. This engine is specialized in adaptively generating diagnostic tokens. Additionally, we propose the information emitter module, which unidirectionally emits information from image tokens to diagnosis tokens. Comprehensive experiments validate the superiority of our methods.

KW - Foundational model

KW - Large language model

KW - Medical image diagnosis

UR - http://www.scopus.com/inward/record.url?scp=85196958593&partnerID=8YFLogxK

U2 - 10.1016/j.media.2024.103248

DO - 10.1016/j.media.2024.103248

M3 - 文章

C2 - 38941859

AN - SCOPUS:85196958593

SN - 1361-8415

VL - 97

JO - Medical Image Analysis

JF - Medical Image Analysis

M1 - 103248

ER -

PneumoLLM: Harnessing the power of large language model for pneumoconiosis diagnosis

摘要

访问文件

其它文件与链接

指纹

引用此