NETtalk, Glossary, Textbook of AI

NETtalk, developed by Terry Sejnowski and his graduate student Charles Rosenberg at Johns Hopkins University and published in 1987, was a feed-forward neural network trained by backpropagation that learned to read English text aloud. It became one of the most-cited and most-played demonstrations of the 1980s connectionist revival.

Architecture

The network had three layers:

An input layer of 203 units, a sliding seven-character window (3 letters of context on each side of a target letter), with each character one-hot encoded over 29 symbols (26 letters, space, comma, full stop)
A single hidden layer of 80 sigmoid units
An output layer of 26 units encoding phonemes plus stress and syllable-boundary markers, which then drove a DECtalk speech synthesiser

The task was to predict the phoneme corresponding to the centre letter of the seven-character window. Training used the standard Rumelhart–Hinton–Williams backpropagation algorithm published the previous year, on a corpus drawn from a 20,008-word phonetic dictionary (a subset of the Brown Corpus) prepared by Rosenberg with a phonetician.

Results

NETtalk achieved 95% phoneme accuracy on the training set and 78% on a held-out test set after roughly 50 training epochs. More importantly than the numbers, the network generalised to unseen words, reproducing systematic English spelling-to-sound patterns it had never been explicitly told.

The famous audio

What made NETtalk famous was Sejnowski's recorded audio of the network's output at successive stages of training. The recordings began as undifferentiated babbling, progressed to recognisable consonant–vowel alternations, and eventually emerged as legible (if robotic) English speech. Played at conferences and in popular-science programmes throughout the late 1980s and early 1990s, the recording dramatised in a few minutes what learning curves on paper could not: a connectionist system visibly acquiring a complex language-related skill.

Significance

NETtalk was not the first neural network, nor the first application of backpropagation, nor a state-of-the-art speech-synthesis system. (Even at the time, dictionary-lookup with hand-crafted rules outperformed it on coverage.) Its importance was rhetorical and historical: it was a concrete demonstration that distributed neural networks could solve a non-trivial language-related task entirely from data, and it contributed substantially to the broader case that connectionism was a serious alternative to symbolic AI rather than a curiosity.

Legacy

The English-text-to-phoneme task NETtalk addressed is now solved in production by lookup in pronunciation dictionaries (CMUdict, Unisyn) supplemented by sequence-to-sequence models for out-of-vocabulary words. Modern speech synthesis (Tacotron, WaveNet, FastSpeech, VALL-E) operates end-to-end on raw audio samples and bypasses phoneme intermediates entirely.

NETtalk's place is now historical: an early proof-of-concept that connected an entire generation of researchers to the connectionist programme that would, twenty years later, mature into deep learning.

Related terms: Backpropagation, terry-sejnowski, Parallel Distributed Processing

Discussed in:

Chapter 4: Probability, The Connectionist Revival

AI tools used: Claude (research, coding, text), ChatGPT (diagrams, images), Grammarly (editing).