A Mixed Semantic Features Model for Chinese NER with Characters and Words

Ning Chang, Jiang Zhong, Qing Li, Jiang Zhu

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

4 Scopus citations

Abstract

Named Entity Recognition (NER) is an essential part of many natural language processing (NLP) tasks. The existing Chinese NER methods are mostly based on word segmentation, or use the character sequences as input. However, using a single granularity representation would suffer from the problems of out-of-vocabulary and word segmentation errors, and the semantic content is relatively simple. In this paper, we introduce the self-attention mechanism into the BiLSTM-CRF neural network structure for Chinese named entity recognition with two embedding. Different from other models, our method combines character and word features at the sequence level, and the attention mechanism computes similarity on the total sequence consisted of characters and words. The character semantic information and the structure of words work together to improve the accuracy of word boundary segmentation and solve the problem of long-phrase combination. We validate our model on MSRA and Weibo corpora, and experiments demonstrate that our model can significantly improve the performance of the Chinese NER task.

Original languageEnglish
Title of host publicationAdvances in Information Retrieval - 42nd European Conference on IR Research, ECIR 2020, Proceedings
EditorsJoemon M. Jose, Emine Yilmaz, João Magalhães, Flávio Martins, Pablo Castells, Nicola Ferro, Mário J. Silva
PublisherSpringer
Pages356-368
Number of pages13
ISBN (Print)9783030454388
DOIs
StatePublished - 2020
Externally publishedYes
Event42nd European Conference on IR Research, ECIR 2020 - Lisbon, Portugal
Duration: 14 Apr 202017 Apr 2020

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume12035 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference42nd European Conference on IR Research, ECIR 2020
Country/TerritoryPortugal
CityLisbon
Period14/04/2017/04/20

Keywords

  • Chinese named entity recognition
  • Entity boundary segmentation
  • Mixed semantic feature
  • Self-attention

Fingerprint

Dive into the research topics of 'A Mixed Semantic Features Model for Chinese NER with Characters and Words'. Together they form a unique fingerprint.

Cite this