LLaVA-Endo: a large language-and-vision assistant for gastrointestinal endoscopy

Jieru Yao, Xueran Li, Qiang Xie, Longfei Han, Yiwen Jia, Nian Liu, Dingwen Zhang, Junwei Han

Research output: Contribution to journalLetterpeer-review

Abstract

We introduce LLaVA-Endo, a large language and vision model designed for the field of GI endoscopy. Specifically, we generate a high-quality dataset for GI endoscopic medical language-image instruction tuning and introduce an innovative progressive transfer learning technique to fine-tune LLaVA. Experimental results show that LLaVA-Endo demonstrates powerful domain expertise and conversational capabilities, outperforming previous SoTA multimodal methods in the field of GI endoscopy data. In future, we intend to collect more data for training and evaluation, and integrate more functionalities such as report generation, and polyp segmentation.

Original languageEnglish
Article number194331
JournalFrontiers of Computer Science
Volume19
Issue number4
DOIs
StatePublished - Apr 2025

Fingerprint

Dive into the research topics of 'LLaVA-Endo: a large language-and-vision assistant for gastrointestinal endoscopy'. Together they form a unique fingerprint.

Cite this