Machine Translation, Glossary, Textbook of AI

Machine translation (MT) is the automatic translation of text between natural languages. The history of MT mirrors the history of AI itself.

Eras:

Rule-based MT (1950s-1980s): hand-coded grammars and dictionaries. Brittle, expensive to develop, didn't scale to many language pairs. The 1966 ALPAC report declared rule-based MT a failure, ending the first MT funding boom.

Statistical MT (1990s-2010s): IBM Models (1990s), phrase-based translation (Koehn 2003), syntax-based methods. Trained on parallel corpora using expectation-maximisation for word alignment. Google Translate's first decade used statistical MT.

Neural MT (2014-present): seq2seq with attention (Bahdanau, Cho, Bengio 2014) launched the neural era. Google switched to NMT in 2016 (Wu et al.), reporting dramatic quality improvements.

Transformer-based MT (2017-present): the original Vaswani 2017 Transformer was a translation system. State of the art for nearly all language pairs. Modern multilingual models (Google's M4, Meta's NLLB, OpenAI's Whisper, Facebook's M2M-100) translate between hundreds of languages.

Modern LLMs: GPT-4, Claude, Gemini achieve translation quality competitive with or exceeding dedicated MT systems for high-resource language pairs, and dramatically better for low-resource languages where dedicated MT had little training data.

Evaluation:

BLEU (Papineni 2002): n-gram precision against references.
chrF: character-level n-gram F-score.
BLEURT, BERTScore, COMET: learned metrics that handle paraphrase and synonym better.
Human evaluation: ranking, MQM (Multidimensional Quality Metrics), professional-translator judgments. Remains the gold standard.

Open challenges:

Low-resource languages: many languages have <1M parallel sentences available. Multilingual transfer and self-supervised methods help but quality lags.
Cultural and pragmatic translation: idioms, cultural references, register, pragmatics.
Document-level translation: most MT operates sentence-by-sentence, missing discourse coherence.
Specialised domains: legal, medical, technical translation requires domain knowledge.
Ethical concerns: cultural homogenisation, low-resource language preservation, deepfake-translation risks.

Machine translation was historically a hardest-task benchmark of NLP. With modern LLMs handling high-resource translation at near-human quality, the field has shifted focus to multilinguality, low-resource languages, and document-level coherence.

Related terms: BLEU, Sequence-to-Sequence, Transformer, ALPAC Report

Discussed in:

Chapter 12: Sequence Models, Sequence Models

AI tools used: Claude (research, coding, text), ChatGPT (diagrams, images), Grammarly (editing).