References

Distilling the Knowledge in a Neural Network

Geoffrey Hinton, Oriol Vinyals, & Jeff Dean (2015)

arXiv.

DOI: https://doi.org/10.48550/arxiv.1503.02531

Abstract. Introduces knowledge distillation, which trains a smaller 'student' model to match the softened output distribution of a larger 'teacher' model. The temperature parameter exposes rich information about relative class similarities, enabling effective model compression.

Tags: efficiency distillation

This site is currently in Beta. Contact: Chris Paton

Textbook of Usability · Textbook of Digital Health

Auckland Maths and Science Tutoring

AI tools used: Claude (research, coding, text), ChatGPT (diagrams, images), Grammarly (editing).