TY - CHAP
T1 - Word Embedding for Understanding Natural Language
T2 - A Survey
AU - Li, Yang
AU - Yang, Tao
N1 - Publisher Copyright:
© 2018, Springer International Publishing AG.
PY - 2018
Y1 - 2018
N2 - Word embedding, where semantic and syntactic features are captured from unlabeled text data, is a basic procedure in Natural Language Processing (NLP). The extracted features thus could be organized in low dimensional space. Some representative word embedding approaches include Probability Language Model, Neural Networks Language Model, Sparse Coding, etc. The state-of-the-art methods like skip-gram negative samplings, noise-contrastive estimation, matrix factorization and hierarchical structure regularizer are applied correspondingly to resolve those models. Most of these literatures are working on the observed count and co-occurrence statistic to learn the word embedding. The increasing scale of data, the sparsity of data representation, word position, and training speed are the main challenges for designing word embedding algorithms. In this survey, we first introduce the motivation and background of word embedding. Next we will introduce the methods of text representation as preliminaries, as well as some existing word embedding approaches such as Neural Network Language Model and Sparse Coding Approach, along with their evaluation metrics. In the end, we summarize the applications of word embedding and discuss its future directions.
AB - Word embedding, where semantic and syntactic features are captured from unlabeled text data, is a basic procedure in Natural Language Processing (NLP). The extracted features thus could be organized in low dimensional space. Some representative word embedding approaches include Probability Language Model, Neural Networks Language Model, Sparse Coding, etc. The state-of-the-art methods like skip-gram negative samplings, noise-contrastive estimation, matrix factorization and hierarchical structure regularizer are applied correspondingly to resolve those models. Most of these literatures are working on the observed count and co-occurrence statistic to learn the word embedding. The increasing scale of data, the sparsity of data representation, word position, and training speed are the main challenges for designing word embedding algorithms. In this survey, we first introduce the motivation and background of word embedding. Next we will introduce the methods of text representation as preliminaries, as well as some existing word embedding approaches such as Neural Network Language Model and Sparse Coding Approach, along with their evaluation metrics. In the end, we summarize the applications of word embedding and discuss its future directions.
KW - Neural Network Language Model
KW - Probability Language Model
KW - Sparse coding approach
KW - Word embedding
KW - Word representation
UR - http://www.scopus.com/inward/record.url?scp=85132885019&partnerID=8YFLogxK
U2 - 10.1007/978-3-319-53817-4_4
DO - 10.1007/978-3-319-53817-4_4
M3 - 章节
AN - SCOPUS:85132885019
T3 - Studies in Big Data
SP - 83
EP - 104
BT - Studies in Big Data
PB - Springer Science and Business Media Deutschland GmbH
ER -