TY - JOUR
T1 - 融合引导注意力的中文长文本摘要生成
AU - Guo, Zhe
AU - Zhang, Zhi Bo
AU - Zhou, Wei Jie
AU - Fan, Yang Yu
AU - Zhang, Yan Ning
N1 - Publisher Copyright:
© 2024 Chinese Institute of Electronics. All rights reserved.
PY - 2024/12/25
Y1 - 2024/12/25
N2 - Current research on Chinese long text summarization based on deep learning has the following problems: (1) summarization models lack information guidance, fail to focus on keywords and sentences, leading to the problem of losing critical information under long-distance span; (2) the word lists of existing Chinese long text summarization models are often word-based and do not contain common Chinese words and punctuation, which is not conducive to extracting multi-grained semantic information. To solve the above problems, a Chinese long text summarization method with guided attention (CLSGA) is proposed in this paper. Firstly, for the long text summarization task, an extraction model is presented to extract the core words and sentences in the long text to construct the guided text, which can guide the generation model to focus on more important information in the encoding process. Secondly, the Chinese long text vocabulary is designed to changing the text structure from words statistics to phrases statistics, which is conducive to extracting richer multi-granularity features. Hierarchical location decomposition encoding is then introduced to efficiently extend location encoding of long text and accelerate network convergence. Finally, the local attention mechanism is combined with the guided attention mechanism to effectively capture the important information under the long text span and improve the accuracy of summarization. Experimental results on four public Chinese abstract datasets with different lengths, LCSTS, CNewSum, NLPCC2017 and SFZY2020, show that our proposed method has significant advantages over long text summarization and can effectively improve the value of ROUGE-1, ROUGE-2 and ROUGE-L.
AB - Current research on Chinese long text summarization based on deep learning has the following problems: (1) summarization models lack information guidance, fail to focus on keywords and sentences, leading to the problem of losing critical information under long-distance span; (2) the word lists of existing Chinese long text summarization models are often word-based and do not contain common Chinese words and punctuation, which is not conducive to extracting multi-grained semantic information. To solve the above problems, a Chinese long text summarization method with guided attention (CLSGA) is proposed in this paper. Firstly, for the long text summarization task, an extraction model is presented to extract the core words and sentences in the long text to construct the guided text, which can guide the generation model to focus on more important information in the encoding process. Secondly, the Chinese long text vocabulary is designed to changing the text structure from words statistics to phrases statistics, which is conducive to extracting richer multi-granularity features. Hierarchical location decomposition encoding is then introduced to efficiently extend location encoding of long text and accelerate network convergence. Finally, the local attention mechanism is combined with the guided attention mechanism to effectively capture the important information under the long text span and improve the accuracy of summarization. Experimental results on four public Chinese abstract datasets with different lengths, LCSTS, CNewSum, NLPCC2017 and SFZY2020, show that our proposed method has significant advantages over long text summarization and can effectively improve the value of ROUGE-1, ROUGE-2 and ROUGE-L.
KW - Chinese long text summarization
KW - guided attention
KW - hierarchical location decomposition coding
KW - local attention
KW - natural language processing
UR - http://www.scopus.com/inward/record.url?scp=85217952590&partnerID=8YFLogxK
U2 - 10.12263/DZXB.20230429
DO - 10.12263/DZXB.20230429
M3 - 文章
AN - SCOPUS:85217952590
SN - 0372-2112
VL - 52
SP - 3914
EP - 3930
JO - Tien Tzu Hsueh Pao/Acta Electronica Sinica
JF - Tien Tzu Hsueh Pao/Acta Electronica Sinica
IS - 12
ER -