Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Veselin Stoyanov, & Luke Zettlemoyer (2020)
Annual Meeting of the Association for Computational Linguistics.
URL: https://arxiv.org/abs/1910.13461
Abstract. Introduces BART, a denoising-autoencoder pretraining objective for encoder-decoder Transformers. The encoder reads a corrupted version of a document, produced by random token masking, span masking, sentence permutation and document rotation, and the decoder reconstructs the clean original. BART matches RoBERTa on classification benchmarks and substantially outperforms previous pretraining objectives on generation tasks (summarisation, translation, dialogue). The model became one of the canonical encoder-decoder pretraining recipes alongside T5 and was widely adapted as a starting point for downstream summarisation systems.
Tags: language-models pretraining sequence-to-sequence
Cited in: