LLaVA-Endo: a large language-and-vision assistant for gastrointestinal endoscopy

Jieru Yao; Xueran Li; Qiang Xie; Longfei Han; Yiwen Jia; Nian Liu; Dingwen Zhang; Junwei Han

doi:10.1007/s11704-024-40319-8

LLaVA-Endo: a large language-and-vision assistant for gastrointestinal endoscopy

Jieru Yao, Xueran Li, Qiang Xie, Longfei Han, Yiwen Jia, Nian Liu, Dingwen Zhang, Junwei Han

自动化学院

科研成果: 期刊稿件 › 快报 › 同行评审

摘要

We introduce LLaVA-Endo, a large language and vision model designed for the field of GI endoscopy. Specifically, we generate a high-quality dataset for GI endoscopic medical language-image instruction tuning and introduce an innovative progressive transfer learning technique to fine-tune LLaVA. Experimental results show that LLaVA-Endo demonstrates powerful domain expertise and conversational capabilities, outperforming previous SoTA multimodal methods in the field of GI endoscopy data. In future, we intend to collect more data for training and evaluation, and integrate more functionalities such as report generation, and polyp segmentation.

源语言	英语
文章编号	194331
期刊	Frontiers of Computer Science
卷	19
期	4
DOI	https://doi.org/10.1007/s11704-024-40319-8
出版状态	已出版 - 4月 2025

访问文件

10.1007/s11704-024-40319-8

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{f8f9772bfe024b1599411c9f8b5f8b3c,

title = "LLaVA-Endo: a large language-and-vision assistant for gastrointestinal endoscopy",

abstract = "We introduce LLaVA-Endo, a large language and vision model designed for the field of GI endoscopy. Specifically, we generate a high-quality dataset for GI endoscopic medical language-image instruction tuning and introduce an innovative progressive transfer learning technique to fine-tune LLaVA. Experimental results show that LLaVA-Endo demonstrates powerful domain expertise and conversational capabilities, outperforming previous SoTA multimodal methods in the field of GI endoscopy data. In future, we intend to collect more data for training and evaluation, and integrate more functionalities such as report generation, and polyp segmentation.",

author = "Jieru Yao and Xueran Li and Qiang Xie and Longfei Han and Yiwen Jia and Nian Liu and Dingwen Zhang and Junwei Han",

note = "Publisher Copyright: {\textcopyright} Higher Education Press 2025.",

year = "2025",

month = apr,

doi = "10.1007/s11704-024-40319-8",

language = "英语",

volume = "19",

journal = "Frontiers of Computer Science",

issn = "2095-2228",

publisher = "Higher Education Press Limited Company",

number = "4",

}

TY - JOUR

T1 - LLaVA-Endo

T2 - a large language-and-vision assistant for gastrointestinal endoscopy

AU - Yao, Jieru

AU - Li, Xueran

AU - Xie, Qiang

AU - Han, Longfei

AU - Jia, Yiwen

AU - Liu, Nian

AU - Zhang, Dingwen

AU - Han, Junwei

PY - 2025/4

Y1 - 2025/4

N2 - We introduce LLaVA-Endo, a large language and vision model designed for the field of GI endoscopy. Specifically, we generate a high-quality dataset for GI endoscopic medical language-image instruction tuning and introduce an innovative progressive transfer learning technique to fine-tune LLaVA. Experimental results show that LLaVA-Endo demonstrates powerful domain expertise and conversational capabilities, outperforming previous SoTA multimodal methods in the field of GI endoscopy data. In future, we intend to collect more data for training and evaluation, and integrate more functionalities such as report generation, and polyp segmentation.

AB - We introduce LLaVA-Endo, a large language and vision model designed for the field of GI endoscopy. Specifically, we generate a high-quality dataset for GI endoscopic medical language-image instruction tuning and introduce an innovative progressive transfer learning technique to fine-tune LLaVA. Experimental results show that LLaVA-Endo demonstrates powerful domain expertise and conversational capabilities, outperforming previous SoTA multimodal methods in the field of GI endoscopy data. In future, we intend to collect more data for training and evaluation, and integrate more functionalities such as report generation, and polyp segmentation.

UR - http://www.scopus.com/inward/record.url?scp=85209715598&partnerID=8YFLogxK

U2 - 10.1007/s11704-024-40319-8

DO - 10.1007/s11704-024-40319-8

M3 - 快报

AN - SCOPUS:85209715598

SN - 2095-2228

VL - 19

JO - Frontiers of Computer Science

JF - Frontiers of Computer Science

IS - 4

M1 - 194331

ER -

LLaVA-Endo: a large language-and-vision assistant for gastrointestinal endoscopy

摘要

访问文件

其它文件与链接

指纹

引用此