Chapter Seventeen

Applications

Learning Objectives
  1. Survey real-world computer vision applications, including medical imaging, autonomous vehicles, and quality inspection
  2. Describe production uses of NLP: search, translation, chatbots, sentiment analysis, and summarisation
  3. Compare collaborative-filtering, content-based, and hybrid recommender system approaches
  4. Identify scientific discovery areas being transformed by AI (AlphaFold, weather, materials, drug design)
  5. Outline the MLOps practices — CI/CD, monitoring, versioning — required to deploy AI responsibly at scale

Every time you unlock your phone with your face, ask a voice helper for the weather, or get a film tip on Netflix, you are using AI. The methods from prior chapters are not just theory. They run in products that billions of people use daily. This chapter tours five areas where AI has had the deepest real-world impact: vision, language, recommendations, science, and the craft of getting models into production.

17.1   Computer Vision

Computer vision teaches machines to read images and video. The field changed in 2012, when AlexNet Krizhevsky, 2012 won ImageNet by a wide margin. Since then, models can spot objects better than humans, create lifelike images from text, and rebuild 3D scenes from photos.

Medical Imaging

Deep learning now matches doctors on several tasks:

  • Skin cancer: classifying lesions from photos at dermatologist-level accuracy
  • Lung nodules: finding spots on chest X-rays at radiologist-level sensitivity
  • Diabetic retinopathy: reading retinal photos at eye-doctor-level accuracy

For telling normal from abnormal, you fine-tune a pretrained model like ResNet He, 2016 or EfficientNet Tan, 2019. For drawing tumour edges or measuring organs, U-Net Ronneberger, 2015 is the standard. The nnU-Net framework picks the architecture for you.

Clinical deployment raises hard issues. Models trained mostly on lighter skin may fail on darker skin. Approvals are slow. Liability when a model misses a case is unresolved. And fitting AI into a busy clinic is an engineering problem of its own.

Self-Driving Cars

A car must detect people, cyclists, other cars, signs, lane marks, and road edges — all at once, in rain, darkness, and glare.

Modern systems fuse cameras, LiDAR (laser ranging), and radar. BEVFormer uses transformers to build a bird's-eye map from multiple camera feeds. Tesla uses cameras only, betting that enough data can replace the depth sensor.

Object Detection

Finding and labelling things in an image has gone through several stages:

  • Two-stage: R-CNN Girshick, 2014, Faster R-CNN Ren, 2015. Accurate but slow.
  • One-stage: YOLO Redmon, 2016, SSD. Faster and simpler.
  • Transformer-based: DETR Carion, 2020. Fewer hand-built parts.

YOLO now runs in real time on phones and drones. The frontier has moved to harder tasks: instance masks (each object outlined), panoptic labels (every pixel tagged), and open-vocab detection (find objects described in plain text).

Generative Vision

DALL·E, Stable Diffusion Rombach, 2022, and Midjourney create images from text prompts, built on diffusion models Ho, 2020 and transformers. Designers use them for concept art and ads. Video generation is catching up fast.

These tools raise hard questions about IP, deepfakes, and the impact on working artists.

Industry and Farming

Less flashy than self-driving cars, but huge in commercial value. Factories use anomaly detection to spot defects on assembly lines. Drones and satellites assess crop health and detect pests. Stores use vision for automated checkout.

In all these fields, the hard problems are not about beating benchmarks. They are about working reliably in the real world, fitting into existing systems, and running without human oversight.

17.2   NLP

Large language models and the transformer have reshaped NLP. Tasks that once needed separate systems — translation, summary, sentiment, Q&A — can now be handled by a single pretrained model. NLP powers search engines, chatbots, content filters, legal review, and much more.

Translation

Until the mid-2010s, translation used complex pipelines of phrase tables and language models. Neural translation replaced it all with a single end-to-end model [Sutskever, 2014; Bahdanau, 2014]. The transformer Vaswani, 2017 made it simpler still.

Google Translate and DeepL now approach human fluency for many language pairs. Low-resource languages remain hard, but multilingual models and few-shot prompting are closing the gap.

Pulling Structure from Text

NLP can extract structured data from messy text. Named entity recognition (NER) finds people, places, and dates. Relation extraction maps how they connect.

Real uses:

  • Finance: spotting mergers and executive changes in news
  • Medicine: extracting drug–gene links from papers
  • Law: finding clauses and risks in contracts

Chatbots and Assistants

Modern chatbots combine large language models with:

  • RLHF Ouyang, 2022 to make replies helpful and safe
  • RAG Lewis, 2020 to ground answers in real documents, cutting hallucination
  • Tool use to call APIs — search, calculators, code runners — and chain multi-step workflows

RAG matters most. Instead of relying on what the model memorised, a RAG system fetches relevant documents and bases its answer on them. This makes replies checkable and current.

Content Moderation

Social platforms process billions of posts per day in dozens of languages. They use layered systems: rule-based filters catch clear violations, classifiers flag likely problems, and humans handle the grey areas. Adversarial users constantly adapt, and moderation standards vary across cultures.

Professional Work

Clinical NLP extracts structured data from doctor's notes. Legal NLP automates contract review. Educational NLP powers tutors and essay scoring. Each domain has its own requirements for accuracy and trust.

17.3   Recommendation Systems

Netflix, Spotify, and Amazon all run on recommendation engines. Netflix estimates its system saves over a billion dollars a year in reduced churn. The entire digital ad industry is built on matching ads to the right users.

Two Classical Approaches

Collaborative filtering uses user behaviour. If you and someone else rate many films the same way, films they liked but you have not seen are good picks. The core technique is matrix factoring: split the sparse user–item matrix into two smaller matrices of latent tastes and item traits.

Content-based filtering uses item features. If you liked sci-fi novels, the system suggests more sci-fi. It draws on genre, author, keywords, and — more and more — deep learning embeddings.

Modern Deep Approaches

  • Wide & Deep: a linear model (for memorising patterns) plus a deep network (for generalising)
  • Two-tower models: separate networks encode users and items into a shared space, enabling fast search across millions of items
  • Sequential models: transformers like SASRec model how your tastes change over time

How a Real System Runs

A live system has three stages:

  1. Candidate generation: quickly pull a few hundred relevant items from millions. Speed is critical — often under a millisecond. FAISS and ScaNN do fast nearest-neighbour search.
  2. Ranking: a richer model scores each candidate using hundreds of features — your profile, the item's metadata, your recent actions, time, device.
  3. Re-ranking: business rules, diversity, and freshness produce the final list.

Evaluation

Offline metrics (precision, recall, nDCG) check how predictions match held-out data. But what counts in practice is whether users click, buy, stay, and return. A/B testing is essential — and running sound online experiments at scale is itself a major investment.

Societal Impact

These systems shape what billions of people read, watch, and believe. Filter bubbles form when the algorithm only shows views you already hold. Engagement-driven systems can push sensational or conspiratorial content because it gets more clicks.

The EU's Digital Services Act requires large platforms to explain their recommendation logic and offer at least one non-profiling option. Balancing business goals with user wellbeing is a defining challenge.

17.4   AI for Science

AI is not just making existing work faster — it is creating new knowledge. ML is now essential across biology, chemistry, physics, and climate science.

Biology and Drug Design

AlphaFold 2 Jumper, 2021 (DeepMind, 2021) predicts protein 3D structures with accuracy that rivals lab methods like X-ray crystallography. Its database covers over 200 million proteins and has sped up work in enzyme design, vaccines, and disease biology.

In drug discovery, ML appears at every stage:

  • Virtual screening: will this molecule bind to the target?
  • Generative chemistry: design new molecules with desired properties
  • ADMET: estimate absorption, distribution, metabolism, excretion, and toxicity
  • Trial design: optimise study protocols

Graph neural networks, which treat molecules as graphs of atoms and bonds, are especially effective here.

Climate and Weather

Deep learning has changed weather forecasting. GraphCast Lam, 2023 makes global forecasts in seconds that match traditional models needing hours of supercomputer time.

For longer-term climate work, ML replaces expensive physics simulations with fast learned models. Satellite vision — powered by computer vision — tracks deforestation, ice sheets, and urban growth at a scale humans cannot match.

Physics and Materials

At the Large Hadron Collider, deep learning sifts billions of collision events to find rare signals like Higgs boson production. In materials science, neural network potentials simulate atomic interactions with near-quantum accuracy at a fraction of the cost.

Maths

AI is even helping mathematicians. Language models generate conjectures and help with formal proofs. DeepMind's work with mathematicians produced new results in knot theory, published in Nature in 2021. Combining language models with proof tools like Lean is a promising direction — AI that can both suggest and verify proofs.

Science-Specific Challenges

Scientific uses demand more than accuracy:

  • Uncertainty: a structure prediction is useless if you cannot say how confident you are.
  • Interpretability: science wants to understand why, not just predict what.
  • Reproducibility: careful train–test splitting, leak prevention, and validation against lab results.

17.5   Deployment & MLOps

You have a model that works on your test set. Now what? Getting it into production — serving real users, at scale, without breaking — is one of the most underrated challenges in applied AI. Most ML projects never reach production. Among those that do, many degrade, break, or run up spiralling costs.

MLOps — machine learning operations — is the discipline that bridges this gap.

Model Serving

Serving means accepting requests, running them through the model, and returning predictions. Frameworks: TensorFlow Serving, TorchServe, Triton.

Latency varies hugely. A recommendation engine must respond in tens of milliseconds. A batch job scoring data overnight can take minutes per item. You pick hardware (CPUs, GPUs, TPUs) to match the model and the time budget.

To cut cost:

  • Quantisation: shrink weights from 32-bit to 8-bit or 4-bit
  • Pruning: remove redundant parameters
  • Distillation: train a small "student" to copy a large "teacher"
  • Operator fusion: merge sequential compute steps

CI/CD for ML

Software CI/CD extends to datasets, feature pipelines, model weights, and evaluation metrics. A typical ML CI/CD pipeline:

  1. Automated data checks (schema, missing values, drift)
  2. Automated retraining on fresh data
  3. Automated evaluation against test sets and fairness benchmarks
  4. Canary or blue-green deployment

Tools: MLflow, Weights & Biases, DVC.

Feature Stores

A feature store stores and serves the features your model consumes. It solves a key problem: making sure training features match serving features. This gap — training–serving skew — is a common and sneaky source of bugs. Options: Feast, Tecton, Hopsworks.

Monitoring and Drift

The world changes. A fraud model trained on last year's patterns may fail as criminals adapt. A demand model may break during a pandemic.

Two kinds of shift can silently degrade your model:

  • Data drift: the input distribution changes over time.
  • Concept drift: the link between inputs and the target changes.

Good monitoring tracks input and output statistics, compares them to training baselines, and triggers alerts or retraining when drift is found.

The Human Side

MLOps is not just tools. It requires teamwork between data scientists, ML engineers, software engineers, and site reliability staff. Clear ownership, good docs, and a culture of blameless post-mortems matter as much as any framework.

A landmark paper — "Hidden Technical Debt in ML Systems" Sculley, 2015 — warns that the model is often a small fraction of the total system. Around it sits a vast amount of data plumbing, feature code, config, monitoring, and testing. Each piece can build up debt that makes the system fragile and costly. Keeping that debt in check is the real work of MLOps.