Machine translation (MT) is the automatic translation of text between natural languages. The history of MT mirrors the history of AI itself.
Eras:
Rule-based MT (1950s-1980s): hand-coded grammars and dictionaries. Brittle, expensive to develop, didn't scale to many language pairs. The 1966 ALPAC report declared rule-based MT a failure, ending the first MT funding boom.
Statistical MT (1990s-2010s): IBM Models (1990s), phrase-based translation (Koehn 2003), syntax-based methods. Trained on parallel corpora using expectation-maximisation for word alignment. Google Translate's first decade used statistical MT.
Neural MT (2014-present): seq2seq with attention (Bahdanau, Cho, Bengio 2014) launched the neural era. Google switched to NMT in 2016 (Wu et al.), reporting dramatic quality improvements.
Transformer-based MT (2017-present): the original Vaswani 2017 Transformer was a translation system. State of the art for nearly all language pairs. Modern multilingual models (Google's M4, Meta's NLLB, OpenAI's Whisper, Facebook's M2M-100) translate between hundreds of languages.
Modern LLMs: GPT-4, Claude, Gemini achieve translation quality competitive with or exceeding dedicated MT systems for high-resource language pairs, and dramatically better for low-resource languages where dedicated MT had little training data.
Evaluation:
- BLEU (Papineni 2002): n-gram precision against references.
- chrF: character-level n-gram F-score.
- BLEURT, BERTScore, COMET: learned metrics that handle paraphrase and synonym better.
- Human evaluation: ranking, MQM (Multidimensional Quality Metrics), professional-translator judgments. Remains the gold standard.
Open challenges:
- Low-resource languages: many languages have <1M parallel sentences available. Multilingual transfer and self-supervised methods help but quality lags.
- Cultural and pragmatic translation: idioms, cultural references, register, pragmatics.
- Document-level translation: most MT operates sentence-by-sentence, missing discourse coherence.
- Specialised domains: legal, medical, technical translation requires domain knowledge.
- Ethical concerns: cultural homogenisation, low-resource language preservation, deepfake-translation risks.
Machine translation was historically a hardest-task benchmark of NLP. With modern LLMs handling high-resource translation at near-human quality, the field has shifted focus to multilinguality, low-resource languages, and document-level coherence.
Related terms: BLEU, Sequence-to-Sequence, Transformer, ALPAC Report
Discussed in:
- Chapter 12: Sequence Models, Sequence Models