Research Article Open Access

Neural Machine Translation for Low-resource English-Bangla

Mohammad Abdullah Al Mumin1, Md Hanif Seddiqui2, Muhammed Zafar Iqbal1 and Mohammed Jahirul Islam1
  • 1 Shahjalal University of Science and Technology, Bangladesh
  • 2 University of Chittagong, Bangladesh

Abstract

Neural machine translation has recently been able to gain state-of-the-art translation quality for many language pairs. However, neural machine translation has been less tested for English-Bangla language pair, two linguistically distant and widely spoken languages. In this paper, we apply neural machine translation to the task of English-Bangla translation in both directions and compare it against a standard phrase-based statistical machine translation system. We obtain up to +0.30 and +4.95 BLEU improvement over phrase-based statistical machine translation for English-to-Bangla and Bangla-to-English respectively. Due to low-resource and morphological richness of Bangla, English-Bangla translation task produces a large number of rare words. We apply subword segmentation with byte pair encoding to handle this rare words issue. We obtain up to +0.69 and +0.30 BLEU improvement over baseline neural machine translation for English-to-Bangla and Bangla-to-English respectively. We further investigate our system output for several challenging linguistic properties like subject-verb agreement, noun inflection, long distance reordering and rare words translation. We observe that neural machine translation with and without subword segmentation significantly outperform the phrase-based statistical machine translation system, thus establishing itself as the state-of-the-art technology for low-resource English-Bangla machine translation.

Journal of Computer Science
Volume 15 No. 11, 2019, 1627-1637

DOI: https://doi.org/10.3844/jcssp.2019.1627.1637

Submitted On: 25 September 2019 Published On: 15 November 2019

How to Cite: Al Mumin, M. A., Seddiqui, M. H., Iqbal, M. Z. & Islam, M. J. (2019). Neural Machine Translation for Low-resource English-Bangla. Journal of Computer Science, 15(11), 1627-1637. https://doi.org/10.3844/jcssp.2019.1627.1637

  • 5,924 Views
  • 3,312 Downloads
  • 18 Citations

Download

Keywords

  • English-Bangla Machine Translation
  • Low-Resource
  • Morphologically Rich
  • Neural Machine Translation