Word Embedding for Understanding Natural Language: A Survey

Yang Li, Tao Yang

科研成果: 书/报告/会议事项章节章节同行评审

198 引用 (Scopus)

摘要

Word embedding, where semantic and syntactic features are captured from unlabeled text data, is a basic procedure in Natural Language Processing (NLP). The extracted features thus could be organized in low dimensional space. Some representative word embedding approaches include Probability Language Model, Neural Networks Language Model, Sparse Coding, etc. The state-of-the-art methods like skip-gram negative samplings, noise-contrastive estimation, matrix factorization and hierarchical structure regularizer are applied correspondingly to resolve those models. Most of these literatures are working on the observed count and co-occurrence statistic to learn the word embedding. The increasing scale of data, the sparsity of data representation, word position, and training speed are the main challenges for designing word embedding algorithms. In this survey, we first introduce the motivation and background of word embedding. Next we will introduce the methods of text representation as preliminaries, as well as some existing word embedding approaches such as Neural Network Language Model and Sparse Coding Approach, along with their evaluation metrics. In the end, we summarize the applications of word embedding and discuss its future directions.

源语言英语
主期刊名Studies in Big Data
出版商Springer Science and Business Media Deutschland GmbH
83-104
页数22
DOI
出版状态已出版 - 2018

出版系列

姓名Studies in Big Data
26
ISSN(印刷版)2197-6503
ISSN(电子版)2197-6511

指纹

探究 'Word Embedding for Understanding Natural Language: A Survey' 的科研主题。它们共同构成独一无二的指纹。

引用此