Neural Machine Translation for Low-resource English-Bangla
- 1 Shahjalal University of Science and Technology, Bangladesh
- 2 University of Chittagong, Bangladesh
Abstract
Neural machine translation has recently been able to gain state-of-the-art translation quality for many language pairs. However, neural machine translation has been less tested for English-Bangla language pair, two linguistically distant and widely spoken languages. In this paper, we apply neural machine translation to the task of English-Bangla translation in both directions and compare it against a standard phrase-based statistical machine translation system. We obtain up to +0.30 and +4.95 BLEU improvement over phrase-based statistical machine translation for English-to-Bangla and Bangla-to-English respectively. Due to low-resource and morphological richness of Bangla, English-Bangla translation task produces a large number of rare words. We apply subword segmentation with byte pair encoding to handle this rare words issue. We obtain up to +0.69 and +0.30 BLEU improvement over baseline neural machine translation for English-to-Bangla and Bangla-to-English respectively. We further investigate our system output for several challenging linguistic properties like subject-verb agreement, noun inflection, long distance reordering and rare words translation. We observe that neural machine translation with and without subword segmentation significantly outperform the phrase-based statistical machine translation system, thus establishing itself as the state-of-the-art technology for low-resource English-Bangla machine translation.
DOI: https://doi.org/10.3844/jcssp.2019.1627.1637
Copyright: © 2019 Mohammad Abdullah Al Mumin, Md Hanif Seddiqui, Muhammed Zafar Iqbal and Mohammed Jahirul Islam. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
- 5,924 Views
- 3,312 Downloads
- 18 Citations
Download
Keywords
- English-Bangla Machine Translation
- Low-Resource
- Morphologically Rich
- Neural Machine Translation