MAVEN-FACT: A Large-scale Event Factuality Detection Dataset

Chunyang Li; Hao Peng; Xiaozhi Wang; Yunjia Qi; Lei Hou; Bin Xu; Juanzi Li

doi:10.18653/v1/2024.findings-emnlp.651

MAVEN-FACT: A Large-scale Event Factuality Detection Dataset

Chunyang Li, Hao Peng, Xiaozhi Wang, Yunjia Qi, Lei Hou, Bin Xu, Juanzi Li

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

Abstract

Event Factuality Detection (EFD) task determines the factuality of textual events, i.e., classifying whether an event is a fact, possibility, or impossibility, which is essential for faithfully understanding and utilizing event knowledge. However, due to the lack of high-quality large-scale data, event factuality detection is under-explored in event understanding research, which limits the development of EFD community. To address these issues and provide faithful event understanding, we introduce MAVEN-FACT, a large-scale and high-quality EFD dataset based on the MAVEN dataset. MAVEN-FACT includes factuality annotations of 112, 276 events, making it the largest EFD dataset. Extensive experiments demonstrate that MAVEN-FACT is challenging for both conventional fine-tuned models and large language models (LLMs). Thanks to the comprehensive annotations of event arguments and relations in MAVEN, MAVEN-FACT also supports some further analyses and we find that adopting event arguments and relations helps in event factuality detection for fine-tuned models but does not benefit LLMs. Furthermore, we preliminarily study an application case of event factuality detection and find it helps in mitigating event-related hallucination in LLMs. Our dataset and codes can be obtained from https://github.com/THU-KEG/MAVEN-FACT.

Original language	English
Title of host publication	EMNLP 2024 - 2024 Conference on Empirical Methods in Natural Language Processing, Findings of EMNLP 2024
Editors	Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Publisher	Association for Computational Linguistics (ACL)
Pages	11140-11158
Number of pages	19
ISBN (Electronic)	9798891761681
DOIs	https://doi.org/10.18653/v1/2024.findings-emnlp.651
State	Published - 2024
Externally published	Yes
Event	2024 Findings of the Association for Computational Linguistics, EMNLP 2024 - Hybrid, Miami, United States Duration: 12 Nov 2024 → 16 Nov 2024

Publication series

Name	EMNLP 2024 - 2024 Conference on Empirical Methods in Natural Language Processing, Findings of EMNLP 2024

Conference

Conference	2024 Findings of the Association for Computational Linguistics, EMNLP 2024
Country/Territory	United States
City	Hybrid, Miami
Period	12/11/24 → 16/11/24

Access to Document

10.18653/v1/2024.findings-emnlp.651

Cite this

Li, C., Peng, H., Wang, X., Qi, Y., Hou, L., Xu, B., & Li, J. (2024). MAVEN-FACT: A Large-scale Event Factuality Detection Dataset. In Y. Al-Onaizan, M. Bansal, & Y.-N. Chen (Eds.), EMNLP 2024 - 2024 Conference on Empirical Methods in Natural Language Processing, Findings of EMNLP 2024 (pp. 11140-11158). (EMNLP 2024 - 2024 Conference on Empirical Methods in Natural Language Processing, Findings of EMNLP 2024). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2024.findings-emnlp.651

Li, Chunyang ; Peng, Hao ; Wang, Xiaozhi et al. / MAVEN-FACT : A Large-scale Event Factuality Detection Dataset. EMNLP 2024 - 2024 Conference on Empirical Methods in Natural Language Processing, Findings of EMNLP 2024. editor / Yaser Al-Onaizan ; Mohit Bansal ; Yun-Nung Chen. Association for Computational Linguistics (ACL), 2024. pp. 11140-11158 (EMNLP 2024 - 2024 Conference on Empirical Methods in Natural Language Processing, Findings of EMNLP 2024).

@inproceedings{25e8ba0ebf9f4c0bbeb4451b8521c045,

title = "MAVEN-FACT: A Large-scale Event Factuality Detection Dataset",

abstract = "Event Factuality Detection (EFD) task determines the factuality of textual events, i.e., classifying whether an event is a fact, possibility, or impossibility, which is essential for faithfully understanding and utilizing event knowledge. However, due to the lack of high-quality large-scale data, event factuality detection is under-explored in event understanding research, which limits the development of EFD community. To address these issues and provide faithful event understanding, we introduce MAVEN-FACT, a large-scale and high-quality EFD dataset based on the MAVEN dataset. MAVEN-FACT includes factuality annotations of 112, 276 events, making it the largest EFD dataset. Extensive experiments demonstrate that MAVEN-FACT is challenging for both conventional fine-tuned models and large language models (LLMs). Thanks to the comprehensive annotations of event arguments and relations in MAVEN, MAVEN-FACT also supports some further analyses and we find that adopting event arguments and relations helps in event factuality detection for fine-tuned models but does not benefit LLMs. Furthermore, we preliminarily study an application case of event factuality detection and find it helps in mitigating event-related hallucination in LLMs. Our dataset and codes can be obtained from https://github.com/THU-KEG/MAVEN-FACT.",

author = "Chunyang Li and Hao Peng and Xiaozhi Wang and Yunjia Qi and Lei Hou and Bin Xu and Juanzi Li",

note = "Publisher Copyright: {\textcopyright} 2024 Association for Computational Linguistics.; 2024 Findings of the Association for Computational Linguistics, EMNLP 2024 ; Conference date: 12-11-2024 Through 16-11-2024",

year = "2024",

doi = "10.18653/v1/2024.findings-emnlp.651",

language = "英语",

series = "EMNLP 2024 - 2024 Conference on Empirical Methods in Natural Language Processing, Findings of EMNLP 2024",

publisher = "Association for Computational Linguistics (ACL)",

pages = "11140--11158",

editor = "Yaser Al-Onaizan and Mohit Bansal and Yun-Nung Chen",

booktitle = "EMNLP 2024 - 2024 Conference on Empirical Methods in Natural Language Processing, Findings of EMNLP 2024",

}

Li, C, Peng, H, Wang, X, Qi, Y, Hou, L, Xu, B & Li, J 2024, MAVEN-FACT: A Large-scale Event Factuality Detection Dataset. in Y Al-Onaizan, M Bansal & Y-N Chen (eds), EMNLP 2024 - 2024 Conference on Empirical Methods in Natural Language Processing, Findings of EMNLP 2024. EMNLP 2024 - 2024 Conference on Empirical Methods in Natural Language Processing, Findings of EMNLP 2024, Association for Computational Linguistics (ACL), pp. 11140-11158, 2024 Findings of the Association for Computational Linguistics, EMNLP 2024, Hybrid, Miami, United States, 12/11/24. https://doi.org/10.18653/v1/2024.findings-emnlp.651

MAVEN-FACT: A Large-scale Event Factuality Detection Dataset. / Li, Chunyang; Peng, Hao; Wang, Xiaozhi et al.
EMNLP 2024 - 2024 Conference on Empirical Methods in Natural Language Processing, Findings of EMNLP 2024. ed. / Yaser Al-Onaizan; Mohit Bansal; Yun-Nung Chen. Association for Computational Linguistics (ACL), 2024. p. 11140-11158 (EMNLP 2024 - 2024 Conference on Empirical Methods in Natural Language Processing, Findings of EMNLP 2024).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - MAVEN-FACT

T2 - 2024 Findings of the Association for Computational Linguistics, EMNLP 2024

AU - Li, Chunyang

AU - Peng, Hao

AU - Wang, Xiaozhi

AU - Qi, Yunjia

AU - Hou, Lei

AU - Xu, Bin

AU - Li, Juanzi

PY - 2024

Y1 - 2024

N2 - Event Factuality Detection (EFD) task determines the factuality of textual events, i.e., classifying whether an event is a fact, possibility, or impossibility, which is essential for faithfully understanding and utilizing event knowledge. However, due to the lack of high-quality large-scale data, event factuality detection is under-explored in event understanding research, which limits the development of EFD community. To address these issues and provide faithful event understanding, we introduce MAVEN-FACT, a large-scale and high-quality EFD dataset based on the MAVEN dataset. MAVEN-FACT includes factuality annotations of 112, 276 events, making it the largest EFD dataset. Extensive experiments demonstrate that MAVEN-FACT is challenging for both conventional fine-tuned models and large language models (LLMs). Thanks to the comprehensive annotations of event arguments and relations in MAVEN, MAVEN-FACT also supports some further analyses and we find that adopting event arguments and relations helps in event factuality detection for fine-tuned models but does not benefit LLMs. Furthermore, we preliminarily study an application case of event factuality detection and find it helps in mitigating event-related hallucination in LLMs. Our dataset and codes can be obtained from https://github.com/THU-KEG/MAVEN-FACT.

AB - Event Factuality Detection (EFD) task determines the factuality of textual events, i.e., classifying whether an event is a fact, possibility, or impossibility, which is essential for faithfully understanding and utilizing event knowledge. However, due to the lack of high-quality large-scale data, event factuality detection is under-explored in event understanding research, which limits the development of EFD community. To address these issues and provide faithful event understanding, we introduce MAVEN-FACT, a large-scale and high-quality EFD dataset based on the MAVEN dataset. MAVEN-FACT includes factuality annotations of 112, 276 events, making it the largest EFD dataset. Extensive experiments demonstrate that MAVEN-FACT is challenging for both conventional fine-tuned models and large language models (LLMs). Thanks to the comprehensive annotations of event arguments and relations in MAVEN, MAVEN-FACT also supports some further analyses and we find that adopting event arguments and relations helps in event factuality detection for fine-tuned models but does not benefit LLMs. Furthermore, we preliminarily study an application case of event factuality detection and find it helps in mitigating event-related hallucination in LLMs. Our dataset and codes can be obtained from https://github.com/THU-KEG/MAVEN-FACT.

UR - http://www.scopus.com/inward/record.url?scp=85217620330&partnerID=8YFLogxK

U2 - 10.18653/v1/2024.findings-emnlp.651

DO - 10.18653/v1/2024.findings-emnlp.651

M3 - 会议稿件

AN - SCOPUS:85217620330

T3 - EMNLP 2024 - 2024 Conference on Empirical Methods in Natural Language Processing, Findings of EMNLP 2024

SP - 11140

EP - 11158

BT - EMNLP 2024 - 2024 Conference on Empirical Methods in Natural Language Processing, Findings of EMNLP 2024

A2 - Al-Onaizan, Yaser

A2 - Bansal, Mohit

A2 - Chen, Yun-Nung

PB - Association for Computational Linguistics (ACL)

Y2 - 12 November 2024 through 16 November 2024

ER -

Li C, Peng H, Wang X, Qi Y, Hou L, Xu B et al. MAVEN-FACT: A Large-scale Event Factuality Detection Dataset. In Al-Onaizan Y, Bansal M, Chen YN, editors, EMNLP 2024 - 2024 Conference on Empirical Methods in Natural Language Processing, Findings of EMNLP 2024. Association for Computational Linguistics (ACL). 2024. p. 11140-11158. (EMNLP 2024 - 2024 Conference on Empirical Methods in Natural Language Processing, Findings of EMNLP 2024). doi: 10.18653/v1/2024.findings-emnlp.651

MAVEN-FACT: A Large-scale Event Factuality Detection Dataset

Abstract

Publication series

Conference

Access to Document

Other files and links

Fingerprint

Cite this