Alex Graves, Santiago Fernández, Faustino Gomez, & Jürgen Schmidhuber (2006)
International Conference on Machine Learning.
DOI: https://doi.org/10.1145/1143844.1143891
Abstract. Introduces Connectionist Temporal Classification (CTC), the loss function that solved the alignment problem for sequence-to-sequence tasks where the output is shorter than the input and there is no a priori alignment. CTC introduces a special "blank" output and defines the loss as the negative log of the sum over all input-output alignments consistent with the target. The forward-backward algorithm makes this tractable. CTC was foundational for deep-learning speech recognition (replacing HMM-GMM systems) and for handwriting recognition, and is still used today in production speech systems.
Tags: speech sequence-models rnn