A study of deep learning methods for same-genre and cross-genre author profiling

Muhammad Adnan Ashraf; Rao Muhammad Adeel Nawab; Feiping Nie

doi:10.3233/JIFS-179896

A study of deep learning methods for same-genre and cross-genre author profiling

Muhammad Adnan Ashraf, Rao Muhammad Adeel Nawab, Feiping Nie

光电与智能研究院

科研成果: 期刊稿件 › 文章 › 同行评审

8 引用（Scopus）

摘要

The aim of the author profiling task is to automatically predict various traits of an author (e.g. age, gender, etc.) from written text. The problem of author profiling has been mainly treated as a supervised text classification task. Initially, traditional machine learning algorithms were used by the researchers to address the problem of author profiling. However, in recent years, deep learning has emerged as a state-of-the-art method for a range of classification problems related to image, audio, video, and text. No previous study has carried out a detailed comparison of deep learning methods to identify which method(s) are most suitable for same-genre and cross-genre author profiling. To fulfill this gap, the main aim of this study is to carry out an in-depth and detailed comparison of state-of-the-art deep learning methods, i.e. CNN, Bi-LSTM, GRU, and CRNN along with proposed ensemble methods, on four PAN Author Profiling corpora. PAN 2015 corpus, PAN 2017 corpus and PAN 2018 Author Profiling corpus were used for same-genre author profiling whereas PAN 2016 Author Profiling corpus was used for cross-genre author profiling. Our extensive experimentation showed that for same-genre author profiling, our proposed ensemble methods produced best results for gender identification task whereas CNN model performed best for age identification task. For cross-genre author profiling, the GRU model outperformed all other approaches for both age and gender.

源语言	英语
页（从-至）	2353-2363
页数	11
期刊	Journal of Intelligent and Fuzzy Systems
卷	39
期	2
DOI	https://doi.org/10.3233/JIFS-179896
出版状态	已出版 - 2020

访问文件

10.3233/JIFS-179896

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{78c95eb4cace4acea4a8d8531390757b,

title = "A study of deep learning methods for same-genre and cross-genre author profiling",

abstract = "The aim of the author profiling task is to automatically predict various traits of an author (e.g. age, gender, etc.) from written text. The problem of author profiling has been mainly treated as a supervised text classification task. Initially, traditional machine learning algorithms were used by the researchers to address the problem of author profiling. However, in recent years, deep learning has emerged as a state-of-the-art method for a range of classification problems related to image, audio, video, and text. No previous study has carried out a detailed comparison of deep learning methods to identify which method(s) are most suitable for same-genre and cross-genre author profiling. To fulfill this gap, the main aim of this study is to carry out an in-depth and detailed comparison of state-of-the-art deep learning methods, i.e. CNN, Bi-LSTM, GRU, and CRNN along with proposed ensemble methods, on four PAN Author Profiling corpora. PAN 2015 corpus, PAN 2017 corpus and PAN 2018 Author Profiling corpus were used for same-genre author profiling whereas PAN 2016 Author Profiling corpus was used for cross-genre author profiling. Our extensive experimentation showed that for same-genre author profiling, our proposed ensemble methods produced best results for gender identification task whereas CNN model performed best for age identification task. For cross-genre author profiling, the GRU model outperformed all other approaches for both age and gender.",

keywords = "age identification, Author profiling, cross-genre author profiling, deep learning, ensemble methods, gender identification, same-genre author profiling",

author = "Ashraf, {Muhammad Adnan} and {Adeel Nawab}, {Rao Muhammad} and Feiping Nie",

year = "2020",

doi = "10.3233/JIFS-179896",

language = "英语",

volume = "39",

pages = "2353--2363",

journal = "Journal of Intelligent and Fuzzy Systems",

issn = "1064-1246",

publisher = "SAGE Publications Ltd",

number = "2",

}

TY - JOUR

T1 - A study of deep learning methods for same-genre and cross-genre author profiling

AU - Ashraf, Muhammad Adnan

AU - Adeel Nawab, Rao Muhammad

AU - Nie, Feiping

PY - 2020

Y1 - 2020

N2 - The aim of the author profiling task is to automatically predict various traits of an author (e.g. age, gender, etc.) from written text. The problem of author profiling has been mainly treated as a supervised text classification task. Initially, traditional machine learning algorithms were used by the researchers to address the problem of author profiling. However, in recent years, deep learning has emerged as a state-of-the-art method for a range of classification problems related to image, audio, video, and text. No previous study has carried out a detailed comparison of deep learning methods to identify which method(s) are most suitable for same-genre and cross-genre author profiling. To fulfill this gap, the main aim of this study is to carry out an in-depth and detailed comparison of state-of-the-art deep learning methods, i.e. CNN, Bi-LSTM, GRU, and CRNN along with proposed ensemble methods, on four PAN Author Profiling corpora. PAN 2015 corpus, PAN 2017 corpus and PAN 2018 Author Profiling corpus were used for same-genre author profiling whereas PAN 2016 Author Profiling corpus was used for cross-genre author profiling. Our extensive experimentation showed that for same-genre author profiling, our proposed ensemble methods produced best results for gender identification task whereas CNN model performed best for age identification task. For cross-genre author profiling, the GRU model outperformed all other approaches for both age and gender.

AB - The aim of the author profiling task is to automatically predict various traits of an author (e.g. age, gender, etc.) from written text. The problem of author profiling has been mainly treated as a supervised text classification task. Initially, traditional machine learning algorithms were used by the researchers to address the problem of author profiling. However, in recent years, deep learning has emerged as a state-of-the-art method for a range of classification problems related to image, audio, video, and text. No previous study has carried out a detailed comparison of deep learning methods to identify which method(s) are most suitable for same-genre and cross-genre author profiling. To fulfill this gap, the main aim of this study is to carry out an in-depth and detailed comparison of state-of-the-art deep learning methods, i.e. CNN, Bi-LSTM, GRU, and CRNN along with proposed ensemble methods, on four PAN Author Profiling corpora. PAN 2015 corpus, PAN 2017 corpus and PAN 2018 Author Profiling corpus were used for same-genre author profiling whereas PAN 2016 Author Profiling corpus was used for cross-genre author profiling. Our extensive experimentation showed that for same-genre author profiling, our proposed ensemble methods produced best results for gender identification task whereas CNN model performed best for age identification task. For cross-genre author profiling, the GRU model outperformed all other approaches for both age and gender.

KW - age identification

KW - Author profiling

KW - cross-genre author profiling

KW - deep learning

KW - ensemble methods

KW - gender identification

KW - same-genre author profiling

UR - http://www.scopus.com/inward/record.url?scp=85091080746&partnerID=8YFLogxK

U2 - 10.3233/JIFS-179896

DO - 10.3233/JIFS-179896

M3 - 文章

AN - SCOPUS:85091080746

SN - 1064-1246

VL - 39

SP - 2353

EP - 2363

JO - Journal of Intelligent and Fuzzy Systems

JF - Journal of Intelligent and Fuzzy Systems

IS - 2

ER -

A study of deep learning methods for same-genre and cross-genre author profiling

摘要

访问文件

其它文件与链接

指纹

引用此