摘要

We introduce LLaVA-Endo, a large language and vision model designed for the field of GI endoscopy. Specifically, we generate a high-quality dataset for GI endoscopic medical language-image instruction tuning and introduce an innovative progressive transfer learning technique to fine-tune LLaVA. Experimental results show that LLaVA-Endo demonstrates powerful domain expertise and conversational capabilities, outperforming previous SoTA multimodal methods in the field of GI endoscopy data. In future, we intend to collect more data for training and evaluation, and integrate more functionalities such as report generation, and polyp segmentation.

源语言英语
文章编号194331
期刊Frontiers of Computer Science
19
4
DOI
出版状态已出版 - 4月 2025

指纹

探究 'LLaVA-Endo: a large language-and-vision assistant for gastrointestinal endoscopy' 的科研主题。它们共同构成独一无二的指纹。

引用此