ChestXRayBERT: A Pretrained Language Model for Chest Radiology Report Summarization

Xiaoyan Cai; Sen Liu; Junwei Han; Libin Yang; Zhenguo Liu; Tianming Liu

doi:10.1109/TMM.2021.3132724

ChestXRayBERT: A Pretrained Language Model for Chest Radiology Report Summarization

Xiaoyan Cai, Sen Liu, Junwei Han, Libin Yang, Zhenguo Liu, Tianming Liu

School of Automation

Research output: Contribution to journal › Article › peer-review

34 Scopus citations

Abstract

Automatically generating the 'impression' section of a radiology report given the 'findings' section can summarize as much salient information of the 'findings' section as possible, thus promoting more effective communication between radiologists and referring physicians. To significantly reduce the workload of radiologists, we develop and evaluate a novel framework of abstractive summarization methods to automatically generate the 'impression' section of chest radiology reports. Despite recent advancements in natural language process (NLP) field such as BERT and its variants, existing abstractive summarization models and methods could not be directly applied to radiology reports, partly due to domain-specific radiology terminology. In response, we develop a pre-trained language model in the chest radiology domain, named ChestXRayBERT, to solve the problem of automatically summarizing chest radiology reports. Specifically, we first collect radiology-related scientific papers as pre-training corpus and pre-train a ChestXRayBERT on it. Then, an abstractive summarization model is proposed, which consists of the pre-trained ChestXRayBERT and a Transformer decoder. Finally, the model is fine-tuned on chest X-ray reports for the abstractive summarization task. When evaluated on the publicly available OPEN-I and MIMIC-CXR datasets, the performance of our proposed model achieves significant improvement compared with other neural networks-based abstractive summarization models. In general, the proposed ChestXRayBERT demonstrates the feasibility and promise of tailoring and extending advanced NLP techniques to the domain of medical imaging and radiology, as well as in the broader biomedicine and healthcare fields in the future.

Original language	English
Pages (from-to)	845-855
Number of pages	11
Journal	IEEE Transactions on Multimedia
Volume	25
DOIs	https://doi.org/10.1109/TMM.2021.3132724
State	Published - 2023

Keywords

abstractive summarization
chest radiology report
Pre-trained language model

Access to Document

10.1109/TMM.2021.3132724

Cite this

@article{04ce42d77646407790b8aed3daad11bd,

title = "ChestXRayBERT: A Pretrained Language Model for Chest Radiology Report Summarization",

abstract = "Automatically generating the 'impression' section of a radiology report given the 'findings' section can summarize as much salient information of the 'findings' section as possible, thus promoting more effective communication between radiologists and referring physicians. To significantly reduce the workload of radiologists, we develop and evaluate a novel framework of abstractive summarization methods to automatically generate the 'impression' section of chest radiology reports. Despite recent advancements in natural language process (NLP) field such as BERT and its variants, existing abstractive summarization models and methods could not be directly applied to radiology reports, partly due to domain-specific radiology terminology. In response, we develop a pre-trained language model in the chest radiology domain, named ChestXRayBERT, to solve the problem of automatically summarizing chest radiology reports. Specifically, we first collect radiology-related scientific papers as pre-training corpus and pre-train a ChestXRayBERT on it. Then, an abstractive summarization model is proposed, which consists of the pre-trained ChestXRayBERT and a Transformer decoder. Finally, the model is fine-tuned on chest X-ray reports for the abstractive summarization task. When evaluated on the publicly available OPEN-I and MIMIC-CXR datasets, the performance of our proposed model achieves significant improvement compared with other neural networks-based abstractive summarization models. In general, the proposed ChestXRayBERT demonstrates the feasibility and promise of tailoring and extending advanced NLP techniques to the domain of medical imaging and radiology, as well as in the broader biomedicine and healthcare fields in the future.",

keywords = "abstractive summarization, chest radiology report, Pre-trained language model",

author = "Xiaoyan Cai and Sen Liu and Junwei Han and Libin Yang and Zhenguo Liu and Tianming Liu",

note = "Publisher Copyright: {\textcopyright} 1999-2012 IEEE.",

year = "2023",

doi = "10.1109/TMM.2021.3132724",

language = "英语",

volume = "25",

pages = "845--855",

journal = "IEEE Transactions on Multimedia",

issn = "1520-9210",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - JOUR

T1 - ChestXRayBERT

T2 - A Pretrained Language Model for Chest Radiology Report Summarization

AU - Cai, Xiaoyan

AU - Liu, Sen

AU - Han, Junwei

AU - Yang, Libin

AU - Liu, Zhenguo

AU - Liu, Tianming

PY - 2023

Y1 - 2023

N2 - Automatically generating the 'impression' section of a radiology report given the 'findings' section can summarize as much salient information of the 'findings' section as possible, thus promoting more effective communication between radiologists and referring physicians. To significantly reduce the workload of radiologists, we develop and evaluate a novel framework of abstractive summarization methods to automatically generate the 'impression' section of chest radiology reports. Despite recent advancements in natural language process (NLP) field such as BERT and its variants, existing abstractive summarization models and methods could not be directly applied to radiology reports, partly due to domain-specific radiology terminology. In response, we develop a pre-trained language model in the chest radiology domain, named ChestXRayBERT, to solve the problem of automatically summarizing chest radiology reports. Specifically, we first collect radiology-related scientific papers as pre-training corpus and pre-train a ChestXRayBERT on it. Then, an abstractive summarization model is proposed, which consists of the pre-trained ChestXRayBERT and a Transformer decoder. Finally, the model is fine-tuned on chest X-ray reports for the abstractive summarization task. When evaluated on the publicly available OPEN-I and MIMIC-CXR datasets, the performance of our proposed model achieves significant improvement compared with other neural networks-based abstractive summarization models. In general, the proposed ChestXRayBERT demonstrates the feasibility and promise of tailoring and extending advanced NLP techniques to the domain of medical imaging and radiology, as well as in the broader biomedicine and healthcare fields in the future.

AB - Automatically generating the 'impression' section of a radiology report given the 'findings' section can summarize as much salient information of the 'findings' section as possible, thus promoting more effective communication between radiologists and referring physicians. To significantly reduce the workload of radiologists, we develop and evaluate a novel framework of abstractive summarization methods to automatically generate the 'impression' section of chest radiology reports. Despite recent advancements in natural language process (NLP) field such as BERT and its variants, existing abstractive summarization models and methods could not be directly applied to radiology reports, partly due to domain-specific radiology terminology. In response, we develop a pre-trained language model in the chest radiology domain, named ChestXRayBERT, to solve the problem of automatically summarizing chest radiology reports. Specifically, we first collect radiology-related scientific papers as pre-training corpus and pre-train a ChestXRayBERT on it. Then, an abstractive summarization model is proposed, which consists of the pre-trained ChestXRayBERT and a Transformer decoder. Finally, the model is fine-tuned on chest X-ray reports for the abstractive summarization task. When evaluated on the publicly available OPEN-I and MIMIC-CXR datasets, the performance of our proposed model achieves significant improvement compared with other neural networks-based abstractive summarization models. In general, the proposed ChestXRayBERT demonstrates the feasibility and promise of tailoring and extending advanced NLP techniques to the domain of medical imaging and radiology, as well as in the broader biomedicine and healthcare fields in the future.

KW - abstractive summarization

KW - chest radiology report

KW - Pre-trained language model

UR - http://www.scopus.com/inward/record.url?scp=85121395984&partnerID=8YFLogxK

U2 - 10.1109/TMM.2021.3132724

DO - 10.1109/TMM.2021.3132724

M3 - 文章

AN - SCOPUS:85121395984

SN - 1520-9210

VL - 25

SP - 845

EP - 855

JO - IEEE Transactions on Multimedia

JF - IEEE Transactions on Multimedia

ER -

ChestXRayBERT: A Pretrained Language Model for Chest Radiology Report Summarization

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this