TY - JOUR
T1 - LLaVA-Endo
T2 - a large language-and-vision assistant for gastrointestinal endoscopy
AU - Yao, Jieru
AU - Li, Xueran
AU - Xie, Qiang
AU - Han, Longfei
AU - Jia, Yiwen
AU - Liu, Nian
AU - Zhang, Dingwen
AU - Han, Junwei
N1 - Publisher Copyright:
© Higher Education Press 2025.
PY - 2025/4
Y1 - 2025/4
N2 - We introduce LLaVA-Endo, a large language and vision model designed for the field of GI endoscopy. Specifically, we generate a high-quality dataset for GI endoscopic medical language-image instruction tuning and introduce an innovative progressive transfer learning technique to fine-tune LLaVA. Experimental results show that LLaVA-Endo demonstrates powerful domain expertise and conversational capabilities, outperforming previous SoTA multimodal methods in the field of GI endoscopy data. In future, we intend to collect more data for training and evaluation, and integrate more functionalities such as report generation, and polyp segmentation.
AB - We introduce LLaVA-Endo, a large language and vision model designed for the field of GI endoscopy. Specifically, we generate a high-quality dataset for GI endoscopic medical language-image instruction tuning and introduce an innovative progressive transfer learning technique to fine-tune LLaVA. Experimental results show that LLaVA-Endo demonstrates powerful domain expertise and conversational capabilities, outperforming previous SoTA multimodal methods in the field of GI endoscopy data. In future, we intend to collect more data for training and evaluation, and integrate more functionalities such as report generation, and polyp segmentation.
UR - http://www.scopus.com/inward/record.url?scp=85209715598&partnerID=8YFLogxK
U2 - 10.1007/s11704-024-40319-8
DO - 10.1007/s11704-024-40319-8
M3 - 快报
AN - SCOPUS:85209715598
SN - 2095-2228
VL - 19
JO - Frontiers of Computer Science
JF - Frontiers of Computer Science
IS - 4
M1 - 194331
ER -