Aaron van den Oord, Oriol Vinyals, & Koray Kavukcuoglu (2017)
arXiv.
DOI: https://doi.org/10.48550/arxiv.1711.00937
Abstract. Introduces VQ-VAE, which replaces the continuous latent space of a VAE with a discrete codebook. VQ-VAE avoids posterior collapse, produces sharper samples, and has become the standard image tokeniser used by modern text-to-image systems.
Tags: generative vae vq-vae