ADELIE: Aligning Large Language Models on Information Extraction

Yunjia Qi, Hao Peng, Xiaozhi Wang, Bin Xu, Lei Hou, Juanzi Li

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Large language models (LLMs) usually fall short on information extraction (IE) tasks and struggle to follow the complex instructions of IE tasks. This primarily arises from LLMs not being aligned with humans, as mainstream alignment datasets typically do not include IE data. In this paper, we introduce ADELIE (Aligning large language moDELs on Information Extraction), an aligned LLM that effectively solves various IE tasks, including closed IE, open IE, and on-demand IE. We first collect and construct a high-quality alignment corpus IEInstruct for IE. Then we train ADELIESFT using instruction tuning on IEInstruct. We further train ADELIESFT with direct preference optimization (DPO) objective, resulting in ADELIEDPO. Extensive experiments on various held-out IE datasets demonstrate that our models (ADELIESFT and ADELIEDPO) achieve state-of-the-art (SoTA) performance among open-source models. We further explore the general capabilities of ADELIE, and experimental results reveal that their general capabilities do not exhibit a noticeable decline. We have released the code, data, and models to facilitate further research.

Original languageEnglish
Title of host publicationEMNLP 2024 - 2024 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference
EditorsYaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
PublisherAssociation for Computational Linguistics (ACL)
Pages7371-7387
Number of pages17
ISBN (Electronic)9798891761643
StatePublished - 2024
Externally publishedYes
Event2024 Conference on Empirical Methods in Natural Language Processing, EMNLP 2024 - Hybrid, Miami, United States
Duration: 12 Nov 202416 Nov 2024

Publication series

NameEMNLP 2024 - 2024 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference

Conference

Conference2024 Conference on Empirical Methods in Natural Language Processing, EMNLP 2024
Country/TerritoryUnited States
CityHybrid, Miami
Period12/11/2416/11/24

Fingerprint

Dive into the research topics of 'ADELIE: Aligning Large Language Models on Information Extraction'. Together they form a unique fingerprint.

Cite this