Jacob Devlin, Ming-Wei Chang, Kenton Lee, & Kristina Toutanova (2019)
Proceedings of NAACL-HLT 2019, Volume 1 (Long and Short Papers), 4171-4186.
DOI: https://doi.org/10.18653/v1/n19-1423
Abstract. Introduces BERT, a bidirectional transformer encoder pre-trained with masked language modelling and next-sentence prediction. BERT's pretrain-then-fine-tune recipe achieved state-of-the-art on most NLP benchmarks and transformed natural language processing.
Tags: transformer bert pretraining nlp
Cited in: