Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Veselin Stoyanov, & Luke Zettlemoyer (2020), References, Textbook of AI

Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Veselin Stoyanov, & Luke Zettlemoyer (2020)

Annual Meeting of the Association for Computational Linguistics.

URL: https://arxiv.org/abs/1910.13461

Abstract. Introduces BART, a denoising-autoencoder pretraining objective for encoder-decoder Transformers. The encoder reads a corrupted version of a document, produced by random token masking, span masking, sentence permutation and document rotation, and the decoder reconstructs the clean original. BART matches RoBERTa on classification benchmarks and substantially outperforms previous pretraining objectives on generation tasks (summarisation, translation, dialogue). The model became one of the canonical encoder-decoder pretraining recipes alongside T5 and was widely adapted as a starting point for downstream summarisation systems.

Tags: language-models pretraining sequence-to-sequence

Cited in:

Chapter 13: Attention & Transformers

AI tools used: Claude (research, coding, text), ChatGPT (diagrams, images), Grammarly (editing).

BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension