Incorporating Typological Features into Language Selection for Multilingual Neural Machine Translation

Chenggang Mi, Shaolin Zhu, Yi Fan, Lei Xie

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Scopus citations

Abstract

In this paper, we propose to use rich semantic and typological information of languages to improve the language selection method for multilingual NMT. In particular, we first use a graph-based model to output the most semantic similarity languages; then, a random forest model is built which integrates features such as data size, language family, word formation, morpheme overlap, word order, POS tag and syntax similarity together to predict the final target language(s). Experimental results on several datasets show that our method achieves consistent improvements over existing approaches both on language selection and multilingual NMT.

Original languageEnglish
Title of host publicationWeb and Big Data - 5th International Joint Conference, APWeb-WAIM 2021, Proceedings
EditorsLeong Hou U, Marc Spaniol, Yasushi Sakurai, Junying Chen
PublisherSpringer Science and Business Media Deutschland GmbH
Pages348-357
Number of pages10
ISBN (Print)9783030858957
DOIs
StatePublished - 2021
Event5th International Joint Conference on Asia-Pacific Web and Web-Age Information Management, APWeb-WAIM 2021 - Guangzhou, China
Duration: 23 Aug 202125 Aug 2021

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume12858 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference5th International Joint Conference on Asia-Pacific Web and Web-Age Information Management, APWeb-WAIM 2021
Country/TerritoryChina
CityGuangzhou
Period23/08/2125/08/21

Keywords

  • Language selection
  • Neural machine translation
  • Typological feature

Fingerprint

Dive into the research topics of 'Incorporating Typological Features into Language Selection for Multilingual Neural Machine Translation'. Together they form a unique fingerprint.

Cite this