Glossary

Sequence-to-Sequence

Also known as: encoder-decoder

The Sequence-to-Sequence (seq2seq) framework, introduced by Sutskever et al. and Cho et al. in 2014, provides an elegant architecture for mapping one variable-length sequence to another. At its core are two components: an encoder that reads the input sequence and produces a compressed representation, and a decoder that generates the output sequence one token at a time, conditioned on that representation. The framework addresses tasks like machine translation, summarisation, image captioning (CNN encoder, RNN decoder), and speech recognition.

In the original formulation, both encoder and decoder are RNNs (typically LSTMs). The encoder processes the input and produces a final hidden state serving as a context vector—a compressed summary of the entire input. The decoder is initialised with this context and generates output autoregressively, taking its own previous output (or ground-truth during training, via teacher forcing) as input for the next step. Generation continues until a special end-of-sequence token is produced.

The information bottleneck of squeezing the entire input into one fixed-length vector limits basic seq2seq on long sequences. This limitation motivated the attention mechanism, which allows the decoder to look back at all encoder hidden states at each step, computing a weighted average focused on the most relevant parts. Attention transformed seq2seq, enabling neural machine translation to surpass statistical systems. The transformer architecture dispensed with recurrence entirely, but the encoder-decoder paradigm itself remains the organising principle of modern models like T5 and BART.

Related terms: Attention Mechanism, Recurrent Neural Network, Transformer

Discussed in:

Also defined in: Textbook of AI