Glossary

Hidden Markov Model

Also known as: HMM

A hidden Markov model (HMM) is a probabilistic model of sequence data with two components:

  • A latent state sequence $z_1, z_2, \ldots, z_T$ following a Markov chain over $K$ states with transition matrix $A$ ($A_{ij} = P(z_t = j | z_{t-1} = i)$) and initial distribution $\pi$.
  • An observation sequence $x_1, x_2, \ldots, x_T$ where each $x_t$ is generated independently given $z_t$ via emission probabilities $b_i(x) = P(x_t = x | z_t = i)$.

The joint probability is

$$P(x_{1:T}, z_{1:T}) = \pi_{z_1} b_{z_1}(x_1) \prod_{t=2}^T A_{z_{t-1}, z_t} b_{z_t}(x_t)$$

Three classical HMM problems:

(1) Likelihood: compute $P(x_{1:T})$. Solved by the forward algorithm (dynamic programming):

$$\alpha_t(i) = P(x_{1:t}, z_t = i) = b_i(x_t) \sum_j \alpha_{t-1}(j) A_{ji}$$

with base case $\alpha_1(i) = \pi_i b_i(x_1)$. Then $P(x_{1:T}) = \sum_i \alpha_T(i)$. Time complexity $O(T K^2)$.

(2) Decoding: find the most likely state sequence $z^* = \arg\max_z P(z | x)$. Solved by the Viterbi algorithm:

$$\delta_t(i) = \max_j [\delta_{t-1}(j) A_{ji}] b_i(x_t)$$

with backtracking to recover the path. Time complexity $O(T K^2)$.

(3) Learning: estimate $A, B, \pi$ from data. Solved by the Baum-Welch algorithm, EM applied to HMMs. The E-step computes posterior probabilities of states ($\gamma_t(i) = P(z_t = i | x)$) and transitions ($\xi_t(i, j)$) using the forward-backward algorithm. The M-step updates parameters by weighted maximum likelihood:

$$\hat A_{ij} = \frac{\sum_t \xi_t(i, j)}{\sum_t \gamma_t(i)}, \quad \hat b_i(x) = \frac{\sum_{t: x_t = x} \gamma_t(i)}{\sum_t \gamma_t(i)}$$

HMMs were the dominant model for speech recognition from roughly 1980 to 2012 (when deep neural networks displaced them) and for many sequence-labelling tasks (POS tagging, NER) until CRFs and then neural sequence models took over. They remain useful in bioinformatics (gene-finding, profile HMMs for protein-family analysis), in time-series anomaly detection, and as a teaching example of probabilistic graphical models.

Modern relatives include conditional random fields (discriminative version of HMMs), input-output HMMs (with input-dependent transitions), and deep state-space models like Mamba (continuous, neural-network-parameterised, but sharing the HMM's Markov structure).

Video

Related terms: Markov Decision Process, Expectation–Maximisation, Mamba, Conditional Random Field

Discussed in:

This site is currently in Beta. Contact: Chris Paton

Textbook of Usability · Textbook of Digital Health

Auckland Maths and Science Tutoring

AI tools used: Claude (research, coding, text), ChatGPT (diagrams, images), Grammarly (editing).