Synthetic content detection is the family of techniques aimed at deciding, after the fact, whether a piece of content (text, image, audio, video) was generated by an AI system. It is the defensive complement to watermarking and provenance: where watermarking embeds a signal at production time with the producer's cooperation, detection attempts to recover a signal without producer cooperation, after the content is in the wild.
Modalities and methods
Text:
DetectGPT (Mitchell et al. 2023), exploits the observation that LLM outputs sit at local maxima of the model's probability landscape; human text does not. Random perturbations decrease likelihood for AI text more than for human text.
GPTZero, Originality.AI, Turnitin AI, commercial classifiers used in education; widely criticised for false positives, especially against non-native English writers.
Watermark detection, if the producing model used a Kirchenbauer-style watermark, downstream detection is reliable.
Images:
Frequency-domain analysis, diffusion and GAN outputs leave fingerprints in the Fourier spectrum.
Patch-based classifiers, networks trained on (real, fake) pairs achieve high accuracy in-distribution but generalise poorly to new generators.
CNNDetect, FakeSpotter, GenImage, academic benchmarks and tools.
Audio and video:
Spectral artefacts in voice-cloned audio (vocoder fingerprints).
Lip-sync inconsistency between generated audio and video.
Physiological signals (rPPG pulse from skin) often absent in generated faces.
The cat-and-mouse problem
Detection is fundamentally adversarial. Each generation a new detector achieves >95% accuracy on contemporary generators; the next generation of generators is then trained or filtered to evade those detectors, and the new detector's accuracy collapses. Sadasivan et al. (2023) argued, both empirically and via an information-theoretic lower bound, that reliable text detection is impossible against sufficiently capable generators without some cooperative signal (watermark, provenance).
Practical implications
For high-stakes applications, academic integrity, journalism, courts, the consensus by 2026 is that detection alone is insufficient. Best practice is to:
Prefer provenance (C2PA credentials, watermarks) over post-hoc detection.
Use detection only as a trigger for human review, never as final adjudication.
Be especially cautious about false positives against non-native speakers and unconventional but human writing.
References
Mitchell et al. (2023). DetectGPT.
Sadasivan et al. (2023). Can AI-Generated Text be Reliably Detected?
Wang et al. (2020). CNN-generated images are surprisingly easy to spot... for now.
Related terms: Watermarking AI Content, C2PA / Content Provenance, Deepfakes
Discussed in:
- Chapter 14: Generative Models, Detection of synthetic content