ADELIE: Aligning Large Language Models on Information Extraction

Yunjia Qi; Hao Peng; Xiaozhi Wang; Bin Xu; Lei Hou; Juanzi Li

ADELIE: Aligning Large Language Models on Information Extraction

Yunjia Qi, Hao Peng, Xiaozhi Wang, Bin Xu, Lei Hou, Juanzi Li

Tsinghua University

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

Abstract

Large language models (LLMs) usually fall short on information extraction (IE) tasks and struggle to follow the complex instructions of IE tasks. This primarily arises from LLMs not being aligned with humans, as mainstream alignment datasets typically do not include IE data. In this paper, we introduce ADELIE (Aligning large language moDELs on Information Extraction), an aligned LLM that effectively solves various IE tasks, including closed IE, open IE, and on-demand IE. We first collect and construct a high-quality alignment corpus IEInstruct for IE. Then we train ADELIE_SFT using instruction tuning on IEInstruct. We further train ADELIE_SFT with direct preference optimization (DPO) objective, resulting in ADELIE_DPO. Extensive experiments on various held-out IE datasets demonstrate that our models (ADELIE_SFT and ADELIE_DPO) achieve state-of-the-art (SoTA) performance among open-source models. We further explore the general capabilities of ADELIE, and experimental results reveal that their general capabilities do not exhibit a noticeable decline. We have released the code, data, and models to facilitate further research.

Original language	English
Title of host publication	EMNLP 2024 - 2024 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference
Editors	Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Publisher	Association for Computational Linguistics (ACL)
Pages	7371-7387
Number of pages	17
ISBN (Electronic)	9798891761643
State	Published - 2024
Externally published	Yes
Event	2024 Conference on Empirical Methods in Natural Language Processing, EMNLP 2024 - Hybrid, Miami, United States Duration: 12 Nov 2024 → 16 Nov 2024

Publication series

Name	EMNLP 2024 - 2024 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference

Conference

Conference	2024 Conference on Empirical Methods in Natural Language Processing, EMNLP 2024
Country/Territory	United States
City	Hybrid, Miami
Period	12/11/24 → 16/11/24

Cite this

Qi, Y., Peng, H., Wang, X., Xu, B., Hou, L., & Li, J. (2024). ADELIE: Aligning Large Language Models on Information Extraction. In Y. Al-Onaizan, M. Bansal, & Y.-N. Chen (Eds.), EMNLP 2024 - 2024 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference (pp. 7371-7387). (EMNLP 2024 - 2024 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference). Association for Computational Linguistics (ACL).

Qi, Yunjia ; Peng, Hao ; Wang, Xiaozhi et al. / ADELIE : Aligning Large Language Models on Information Extraction. EMNLP 2024 - 2024 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference. editor / Yaser Al-Onaizan ; Mohit Bansal ; Yun-Nung Chen. Association for Computational Linguistics (ACL), 2024. pp. 7371-7387 (EMNLP 2024 - 2024 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference).

@inproceedings{0dc9586eb92849dead5f9336c0feece3,

title = "ADELIE: Aligning Large Language Models on Information Extraction",

abstract = "Large language models (LLMs) usually fall short on information extraction (IE) tasks and struggle to follow the complex instructions of IE tasks. This primarily arises from LLMs not being aligned with humans, as mainstream alignment datasets typically do not include IE data. In this paper, we introduce ADELIE (Aligning large language moDELs on Information Extraction), an aligned LLM that effectively solves various IE tasks, including closed IE, open IE, and on-demand IE. We first collect and construct a high-quality alignment corpus IEInstruct for IE. Then we train ADELIESFT using instruction tuning on IEInstruct. We further train ADELIESFT with direct preference optimization (DPO) objective, resulting in ADELIEDPO. Extensive experiments on various held-out IE datasets demonstrate that our models (ADELIESFT and ADELIEDPO) achieve state-of-the-art (SoTA) performance among open-source models. We further explore the general capabilities of ADELIE, and experimental results reveal that their general capabilities do not exhibit a noticeable decline. We have released the code, data, and models to facilitate further research.",

author = "Yunjia Qi and Hao Peng and Xiaozhi Wang and Bin Xu and Lei Hou and Juanzi Li",

note = "Publisher Copyright: {\textcopyright} 2024 Association for Computational Linguistics.; 2024 Conference on Empirical Methods in Natural Language Processing, EMNLP 2024 ; Conference date: 12-11-2024 Through 16-11-2024",

year = "2024",

language = "英语",

series = "EMNLP 2024 - 2024 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference",

publisher = "Association for Computational Linguistics (ACL)",

pages = "7371--7387",

editor = "Yaser Al-Onaizan and Mohit Bansal and Yun-Nung Chen",

booktitle = "EMNLP 2024 - 2024 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference",

}

Qi, Y, Peng, H, Wang, X, Xu, B, Hou, L & Li, J 2024, ADELIE: Aligning Large Language Models on Information Extraction. in Y Al-Onaizan, M Bansal & Y-N Chen (eds), EMNLP 2024 - 2024 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference. EMNLP 2024 - 2024 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference, Association for Computational Linguistics (ACL), pp. 7371-7387, 2024 Conference on Empirical Methods in Natural Language Processing, EMNLP 2024, Hybrid, Miami, United States, 12/11/24.

ADELIE: Aligning Large Language Models on Information Extraction. / Qi, Yunjia; Peng, Hao; Wang, Xiaozhi et al.
EMNLP 2024 - 2024 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference. ed. / Yaser Al-Onaizan; Mohit Bansal; Yun-Nung Chen. Association for Computational Linguistics (ACL), 2024. p. 7371-7387 (EMNLP 2024 - 2024 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - ADELIE

T2 - 2024 Conference on Empirical Methods in Natural Language Processing, EMNLP 2024

AU - Qi, Yunjia

AU - Peng, Hao

AU - Wang, Xiaozhi

AU - Xu, Bin

AU - Hou, Lei

AU - Li, Juanzi

PY - 2024

Y1 - 2024

N2 - Large language models (LLMs) usually fall short on information extraction (IE) tasks and struggle to follow the complex instructions of IE tasks. This primarily arises from LLMs not being aligned with humans, as mainstream alignment datasets typically do not include IE data. In this paper, we introduce ADELIE (Aligning large language moDELs on Information Extraction), an aligned LLM that effectively solves various IE tasks, including closed IE, open IE, and on-demand IE. We first collect and construct a high-quality alignment corpus IEInstruct for IE. Then we train ADELIESFT using instruction tuning on IEInstruct. We further train ADELIESFT with direct preference optimization (DPO) objective, resulting in ADELIEDPO. Extensive experiments on various held-out IE datasets demonstrate that our models (ADELIESFT and ADELIEDPO) achieve state-of-the-art (SoTA) performance among open-source models. We further explore the general capabilities of ADELIE, and experimental results reveal that their general capabilities do not exhibit a noticeable decline. We have released the code, data, and models to facilitate further research.

AB - Large language models (LLMs) usually fall short on information extraction (IE) tasks and struggle to follow the complex instructions of IE tasks. This primarily arises from LLMs not being aligned with humans, as mainstream alignment datasets typically do not include IE data. In this paper, we introduce ADELIE (Aligning large language moDELs on Information Extraction), an aligned LLM that effectively solves various IE tasks, including closed IE, open IE, and on-demand IE. We first collect and construct a high-quality alignment corpus IEInstruct for IE. Then we train ADELIESFT using instruction tuning on IEInstruct. We further train ADELIESFT with direct preference optimization (DPO) objective, resulting in ADELIEDPO. Extensive experiments on various held-out IE datasets demonstrate that our models (ADELIESFT and ADELIEDPO) achieve state-of-the-art (SoTA) performance among open-source models. We further explore the general capabilities of ADELIE, and experimental results reveal that their general capabilities do not exhibit a noticeable decline. We have released the code, data, and models to facilitate further research.

UR - http://www.scopus.com/inward/record.url?scp=85217799761&partnerID=8YFLogxK

M3 - 会议稿件

AN - SCOPUS:85217799761

T3 - EMNLP 2024 - 2024 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference

SP - 7371

EP - 7387

BT - EMNLP 2024 - 2024 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference

A2 - Al-Onaizan, Yaser

A2 - Bansal, Mohit

A2 - Chen, Yun-Nung

PB - Association for Computational Linguistics (ACL)

Y2 - 12 November 2024 through 16 November 2024

ER -

Qi Y, Peng H, Wang X, Xu B, Hou L, Li J. ADELIE: Aligning Large Language Models on Information Extraction. In Al-Onaizan Y, Bansal M, Chen YN, editors, EMNLP 2024 - 2024 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference. Association for Computational Linguistics (ACL). 2024. p. 7371-7387. (EMNLP 2024 - 2024 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference).

ADELIE: Aligning Large Language Models on Information Extraction

Abstract

Publication series

Conference

Other files and links

Fingerprint

Cite this