TY - JOUR
T1 - A study of deep learning methods for same-genre and cross-genre author profiling
AU - Ashraf, Muhammad Adnan
AU - Adeel Nawab, Rao Muhammad
AU - Nie, Feiping
N1 - Publisher Copyright:
© 2020 - IOS Press and the authors. All rights reserved.
PY - 2020
Y1 - 2020
N2 - The aim of the author profiling task is to automatically predict various traits of an author (e.g. age, gender, etc.) from written text. The problem of author profiling has been mainly treated as a supervised text classification task. Initially, traditional machine learning algorithms were used by the researchers to address the problem of author profiling. However, in recent years, deep learning has emerged as a state-of-the-art method for a range of classification problems related to image, audio, video, and text. No previous study has carried out a detailed comparison of deep learning methods to identify which method(s) are most suitable for same-genre and cross-genre author profiling. To fulfill this gap, the main aim of this study is to carry out an in-depth and detailed comparison of state-of-the-art deep learning methods, i.e. CNN, Bi-LSTM, GRU, and CRNN along with proposed ensemble methods, on four PAN Author Profiling corpora. PAN 2015 corpus, PAN 2017 corpus and PAN 2018 Author Profiling corpus were used for same-genre author profiling whereas PAN 2016 Author Profiling corpus was used for cross-genre author profiling. Our extensive experimentation showed that for same-genre author profiling, our proposed ensemble methods produced best results for gender identification task whereas CNN model performed best for age identification task. For cross-genre author profiling, the GRU model outperformed all other approaches for both age and gender.
AB - The aim of the author profiling task is to automatically predict various traits of an author (e.g. age, gender, etc.) from written text. The problem of author profiling has been mainly treated as a supervised text classification task. Initially, traditional machine learning algorithms were used by the researchers to address the problem of author profiling. However, in recent years, deep learning has emerged as a state-of-the-art method for a range of classification problems related to image, audio, video, and text. No previous study has carried out a detailed comparison of deep learning methods to identify which method(s) are most suitable for same-genre and cross-genre author profiling. To fulfill this gap, the main aim of this study is to carry out an in-depth and detailed comparison of state-of-the-art deep learning methods, i.e. CNN, Bi-LSTM, GRU, and CRNN along with proposed ensemble methods, on four PAN Author Profiling corpora. PAN 2015 corpus, PAN 2017 corpus and PAN 2018 Author Profiling corpus were used for same-genre author profiling whereas PAN 2016 Author Profiling corpus was used for cross-genre author profiling. Our extensive experimentation showed that for same-genre author profiling, our proposed ensemble methods produced best results for gender identification task whereas CNN model performed best for age identification task. For cross-genre author profiling, the GRU model outperformed all other approaches for both age and gender.
KW - age identification
KW - Author profiling
KW - cross-genre author profiling
KW - deep learning
KW - ensemble methods
KW - gender identification
KW - same-genre author profiling
UR - http://www.scopus.com/inward/record.url?scp=85091080746&partnerID=8YFLogxK
U2 - 10.3233/JIFS-179896
DO - 10.3233/JIFS-179896
M3 - 文章
AN - SCOPUS:85091080746
SN - 1064-1246
VL - 39
SP - 2353
EP - 2363
JO - Journal of Intelligent and Fuzzy Systems
JF - Journal of Intelligent and Fuzzy Systems
IS - 2
ER -