Aäron van den Oord, Nal Kalchbrenner, & Koray Kavukcuoglu (2016)
International Conference on Machine Learning.
URL: https://arxiv.org/abs/1601.06759
Abstract. Introduces PixelRNN and PixelCNN, autoregressive models for image generation. PixelRNN scans an image row by row with an LSTM, factorising the joint distribution as a product of per-pixel categorical conditionals; PixelCNN replaces the recurrence with masked convolutions for parallel training. The model produces sharp, high-quality samples and provides exact log-likelihoods, in contrast to GANs which trade likelihoods for sample quality. PixelRNN/CNN was the dominant likelihood-based image-generation framework before the diffusion revolution and remains pedagogically important as the simplest non-trivial autoregressive image model.
Tags: generative-models vision
Cited in: