Abstract
We introduce LLaVA-Endo, a large language and vision model designed for the field of GI endoscopy. Specifically, we generate a high-quality dataset for GI endoscopic medical language-image instruction tuning and introduce an innovative progressive transfer learning technique to fine-tune LLaVA. Experimental results show that LLaVA-Endo demonstrates powerful domain expertise and conversational capabilities, outperforming previous SoTA multimodal methods in the field of GI endoscopy data. In future, we intend to collect more data for training and evaluation, and integrate more functionalities such as report generation, and polyp segmentation.
| Original language | English |
|---|---|
| Article number | 194331 |
| Journal | Frontiers of Computer Science |
| Volume | 19 |
| Issue number | 4 |
| DOIs | |
| State | Published - Apr 2025 |
Fingerprint
Dive into the research topics of 'LLaVA-Endo: a large language-and-vision assistant for gastrointestinal endoscopy'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver