David M. Blei, Andrew Y. Ng, & Michael I. Jordan (2003)
Journal of Machine Learning Research.
URL: https://www.jmlr.org/papers/v3/blei03a.html
Abstract. Introduces Latent Dirichlet Allocation (LDA), a probabilistic generative model of document collections. Each document is modelled as a mixture over a finite set of latent topics, and each topic is a distribution over the vocabulary. The Dirichlet priors over both the topic mixtures and the word distributions yield a fully Bayesian model with conjugate updates that admits efficient variational and Gibbs-sampling inference. LDA became the de facto topic model and dominated unsupervised text analysis for over a decade. It also seeded a substantial Bayesian nonparametric literature on hierarchical topic models, dynamic topic models and supervised LDA.
Tags: unsupervised-learning probabilistic-models topic-models
Cited in: