Word Embedding for Understanding Natural Language: A Survey

Yang Li; Tao Yang

doi:10.1007/978-3-319-53817-4_4

Word Embedding for Understanding Natural Language: A Survey

Yang Li, Tao Yang

自动化学院

Northwestern Polytechnical University Xian

科研成果: 书/报告/会议事项章节 › 章节 › 同行评审

198 引用（Scopus）

摘要

Word embedding, where semantic and syntactic features are captured from unlabeled text data, is a basic procedure in Natural Language Processing (NLP). The extracted features thus could be organized in low dimensional space. Some representative word embedding approaches include Probability Language Model, Neural Networks Language Model, Sparse Coding, etc. The state-of-the-art methods like skip-gram negative samplings, noise-contrastive estimation, matrix factorization and hierarchical structure regularizer are applied correspondingly to resolve those models. Most of these literatures are working on the observed count and co-occurrence statistic to learn the word embedding. The increasing scale of data, the sparsity of data representation, word position, and training speed are the main challenges for designing word embedding algorithms. In this survey, we first introduce the motivation and background of word embedding. Next we will introduce the methods of text representation as preliminaries, as well as some existing word embedding approaches such as Neural Network Language Model and Sparse Coding Approach, along with their evaluation metrics. In the end, we summarize the applications of word embedding and discuss its future directions.

源语言	英语
主期刊名	Studies in Big Data
出版商	Springer Science and Business Media Deutschland GmbH
页	83-104
页数	22
DOI	https://doi.org/10.1007/978-3-319-53817-4_4
出版状态	已出版 - 2018

出版系列

姓名	Studies in Big Data
卷	26
ISSN（印刷版）	2197-6503
ISSN（电子版）	2197-6511

访问文件

10.1007/978-3-319-53817-4_4

其它文件与链接

链接到 Scopus 的出版物

引用此

@inbook{14d4a5be98a84f769174b0c3abfade95,

title = "Word Embedding for Understanding Natural Language: A Survey",

abstract = "Word embedding, where semantic and syntactic features are captured from unlabeled text data, is a basic procedure in Natural Language Processing (NLP). The extracted features thus could be organized in low dimensional space. Some representative word embedding approaches include Probability Language Model, Neural Networks Language Model, Sparse Coding, etc. The state-of-the-art methods like skip-gram negative samplings, noise-contrastive estimation, matrix factorization and hierarchical structure regularizer are applied correspondingly to resolve those models. Most of these literatures are working on the observed count and co-occurrence statistic to learn the word embedding. The increasing scale of data, the sparsity of data representation, word position, and training speed are the main challenges for designing word embedding algorithms. In this survey, we first introduce the motivation and background of word embedding. Next we will introduce the methods of text representation as preliminaries, as well as some existing word embedding approaches such as Neural Network Language Model and Sparse Coding Approach, along with their evaluation metrics. In the end, we summarize the applications of word embedding and discuss its future directions.",

keywords = "Neural Network Language Model, Probability Language Model, Sparse coding approach, Word embedding, Word representation",

author = "Yang Li and Tao Yang",

note = "Publisher Copyright: {\textcopyright} 2018, Springer International Publishing AG.",

year = "2018",

doi = "10.1007/978-3-319-53817-4_4",

language = "英语",

series = "Studies in Big Data",

publisher = "Springer Science and Business Media Deutschland GmbH",

pages = "83--104",

booktitle = "Studies in Big Data",

}

TY - CHAP

T1 - Word Embedding for Understanding Natural Language

T2 - A Survey

AU - Li, Yang

AU - Yang, Tao

PY - 2018

Y1 - 2018

N2 - Word embedding, where semantic and syntactic features are captured from unlabeled text data, is a basic procedure in Natural Language Processing (NLP). The extracted features thus could be organized in low dimensional space. Some representative word embedding approaches include Probability Language Model, Neural Networks Language Model, Sparse Coding, etc. The state-of-the-art methods like skip-gram negative samplings, noise-contrastive estimation, matrix factorization and hierarchical structure regularizer are applied correspondingly to resolve those models. Most of these literatures are working on the observed count and co-occurrence statistic to learn the word embedding. The increasing scale of data, the sparsity of data representation, word position, and training speed are the main challenges for designing word embedding algorithms. In this survey, we first introduce the motivation and background of word embedding. Next we will introduce the methods of text representation as preliminaries, as well as some existing word embedding approaches such as Neural Network Language Model and Sparse Coding Approach, along with their evaluation metrics. In the end, we summarize the applications of word embedding and discuss its future directions.

AB - Word embedding, where semantic and syntactic features are captured from unlabeled text data, is a basic procedure in Natural Language Processing (NLP). The extracted features thus could be organized in low dimensional space. Some representative word embedding approaches include Probability Language Model, Neural Networks Language Model, Sparse Coding, etc. The state-of-the-art methods like skip-gram negative samplings, noise-contrastive estimation, matrix factorization and hierarchical structure regularizer are applied correspondingly to resolve those models. Most of these literatures are working on the observed count and co-occurrence statistic to learn the word embedding. The increasing scale of data, the sparsity of data representation, word position, and training speed are the main challenges for designing word embedding algorithms. In this survey, we first introduce the motivation and background of word embedding. Next we will introduce the methods of text representation as preliminaries, as well as some existing word embedding approaches such as Neural Network Language Model and Sparse Coding Approach, along with their evaluation metrics. In the end, we summarize the applications of word embedding and discuss its future directions.

KW - Neural Network Language Model

KW - Probability Language Model

KW - Sparse coding approach

KW - Word embedding

KW - Word representation

UR - http://www.scopus.com/inward/record.url?scp=85132885019&partnerID=8YFLogxK

U2 - 10.1007/978-3-319-53817-4_4

DO - 10.1007/978-3-319-53817-4_4

M3 - 章节

AN - SCOPUS:85132885019

T3 - Studies in Big Data

SP - 83

EP - 104

BT - Studies in Big Data

PB - Springer Science and Business Media Deutschland GmbH

ER -

Word Embedding for Understanding Natural Language: A Survey

摘要

出版系列

访问文件

其它文件与链接

指纹

引用此