Applications: 17.11   Creative work

Dr Chris Paton

17.11 Creative work

If medicine asks AI to be careful and finance asks it to be fast, creative work asks it to be surprising. Image generators, music generators, video generators and the steady undertow of language models in the writer's room have, in a few short years, gone from research curiosity to daily tool. A jobbing illustrator now opens Photoshop and Midjourney in the same morning. A demo producer drafts a backing track in Suno before the band arrives. A novelist runs a chapter through Sudowrite to test whether the dialogue lands. A video editor stitches a Sora clip into a corporate explainer for a shot that would once have demanded a film crew. None of this resembles the older fear that machines would make art instead of humans. Most working creators describe the experience as collaborative, a strange, often frustrating partner that produces ten ideas an hour and forces them to choose which is theirs.

Section 17.10 looked at education, where the central question is what the student should still be expected to do for themselves. Creative work has the same question with the stakes inverted. Education is mainly worried about learning; the creative industries are mainly worried about livelihoods. The technology, the legal regime around training data, and the bargaining power of working artists are all moving at different speeds. The picture below is therefore unstable. What follows is a snapshot of where each medium sits in early 2026, who the practitioners are, and which disputes remain open.

Image generation

The image-generation stack in early 2026 has settled into roughly five families. Midjourney, founded by David Holz, run as a small independent studio out of Discord, is the consumer-quality leader for stylised art. Versions 5, 6 and 7 each tightened prompt adherence and pushed the default aesthetic further from "AI look" into something closer to a competent illustrator's pencil. Many graphic designers now use Midjourney as a moodboard generator, producing twenty plausible directions for a brief in the time it once took to brief a stock-image search. Stable Diffusion, Stable Diffusion 1.5 in 2022, SDXL in 2023, Stable Diffusion 3 in 2024, anchors the open-weights ecosystem. It is the model behind Automatic1111, ComfyUI, the LoRA fine-tuning culture and the ControlNet conditioning workflows that let an illustrator sketch a pose and have the model fill in the rendering. DALL-E 3 ships inside ChatGPT, where the rewriting of a casual prompt by the surrounding language model often matters as much as the image model itself. Adobe Firefly is the commercial-safety play: trained, Adobe says, only on Adobe Stock and licensed material, integrated directly into Photoshop and Illustrator, and paired with an indemnification policy for enterprise customers. Flux, from Black Forest Labs (founded by several of the original Stable Diffusion authors), became the leading open-weights model in 2024–2025; its Pro, Dev and Schnell variants serve commercial production, local hobbyists and real-time use cases respectively. Google's Imagen sits inside the Gemini suite and Google Cloud.

The practitioners are not, on the whole, replacing themselves. Illustrators use these tools for thumbnailing and for backgrounds. Concept artists use them for variations on an established design. Advertising studios use them for compositional studies before a photo shoot. Stock-photo libraries are the most directly disrupted: customers who previously paid for a serviceable image of "two colleagues laughing in a meeting room" can now generate one in seconds. Casual users generate birthday cards and social-media memes. The technical recipe across all of these models remains broadly the latent-diffusion stack of Rombach and colleagues' 2022 paper, with progressively better text encoders, larger DiT-style backbones in place of the earlier U-Nets, and steadily improved curation of training data.

Music

Music generation crossed a similar threshold in 2024. Suno and Udio, both founded in 2023, generate two-to-four-minute songs, vocals, lyrics, accompaniment, from a short text prompt with stylistic descriptors ("acoustic folk ballad in the manner of early Nick Drake"). The output is recognisably a song, and on a first listen it is increasingly hard to distinguish from a competent demo recording. Stable Audio (Stability AI) generates instrumental music and sound design. MusicGen (Meta, 2023) is the leading open-weights model. ElevenLabs dominates the voice-cloning market: a thirty-second sample of a voice produces a clone that can deliver any text, in any language, with controllable emotional inflection. Soundraw and similar services target podcasters and video creators who need royalty-free background music on demand.

The architectures are typically transformers or diffusion models operating on neural audio codec tokens, EnCodec or SoundStream, rather than raw waveforms, with the codec doing the heavy lifting of compressing audio into discrete units that a language-model-style backbone can predict. The commercial deployment is moving faster than the legal position. In June 2024 the major record labels, Sony, Warner, Universal, sued Suno and Udio in coordinated actions, arguing that the models could only have learnt to reproduce the stylistic fingerprints of named artists by being trained on those artists' recordings. The cases remain unresolved at the time of writing and will be among the first major appellate rulings on whether training a generative audio model on copyrighted recordings constitutes fair use. The voice-acting industry has been hit harder and faster than musicians: video games, audiobooks and explainer videos increasingly use synthetic voices, with a corresponding contraction in session work.

Writing

Writing is where AI has been embedded the longest, the most quietly, and the most universally. The dominant pattern is not "machine writes the novel" but "machine drafts, human edits". Working journalists use ChatGPT and Claude to summarise documents, generate first drafts of routine pieces (earnings reports, weather, sports recaps), and rewrite paragraphs that aren't landing. Grammarly, long predating the LLM era, has folded large language models into its copy-editing pipeline. Sudowrite targets fiction writers explicitly, with a "describe" command for sensory detail, a "brainstorm" command for plot directions, and a continuation engine. Jasper and similar products serve the marketing-copy end of the market. Notion AI, Microsoft Copilot for Word and Google Docs' Help me write have made AI drafting a default feature of office software.

A useful empirical pattern has emerged. The reflective uses of language models, summarising a long document, extracting key points from a meeting transcript, comparing two policies, suggesting the title of an article that has already been written, gain more reliable user acceptance than the generative uses. Asking a model to produce an original three-thousand-word essay on a specified topic still tends to yield bland, structurally repetitive prose; asking it to digest an existing three-thousand-word essay tends to yield something genuinely useful. Writers who have integrated these tools well typically describe them as a fast, slightly unreliable copy editor and research assistant rather than as a co-author. The 2023 Writers Guild of America strike, discussed below, settled the most contested professional question, whether AI-generated drafts could be put in front of writers as starting points, firmly in favour of the writers.

Video

Video is the medium where the technology has moved fastest in the shortest time. Sora (OpenAI, announced February 2024, public release December 2024) generates up to one minute of 1080p video at thirty frames per second from a text prompt, using a diffusion transformer that operates on spacetime patch tokens. Veo (Google, with Veo 3 released in 2025) is the comparable Google offering and, in Veo 3, dramatically improved physical fidelity, water behaves like water, cloth folds like cloth. Runway Gen-3 Alpha (June 2024), Pika, Luma Dream Machine and Kling AI (Kuaishou) round out the commercial market. Clip lengths have grown from the four-second demos of 2023 to thirty- and sixty-second clips in early 2026.

Production-grade video output still requires a human pipeline. Multi-shot sequences with a consistent character, the same actor in scene one and scene three, remain an active research problem. Physical interactions ("the glass falls and shatters") often produce uncanny artefacts under close inspection. The current sweet spot is short establishing shots, abstract or stylised sequences, and B-roll: not a feature film, but a useful injection into an editing timeline. Advertising agencies, music-video producers and corporate communications teams are the most active early adopters. Independent filmmakers are using these tools to prototype shots that they cannot afford to film conventionally. The visual-effects industry has been more nuanced than the displacement narrative suggests: generative tools augment compositors and matte painters rather than replace them, but the workforce has contracted, and the entry-level rungs of the VFX career ladder have thinned.

Labour and copyright disputes

The 2023 Hollywood strikes were the watershed event. The Writers Guild of America struck for 148 days; the resulting Minimum Basic Agreement explicitly restricts the use of AI to write or rewrite literary material, prohibits AI-generated material from being treated as source material that writers must adapt, and requires studios to disclose any AI use to writers. SAG-AFTRA, the actors' union, struck for 118 days; the resulting agreement requires informed consent and compensation for any AI-generated digital replica of an actor. The agreements did not ban AI from production; they put consent and compensation gates around it.

Visual artists have had no comparable bargaining table. The major lawsuits, Getty v. Stability AI in the United Kingdom and the United States, the artist class action against Stability and Midjourney, and the various authors' suits against OpenAI and Anthropic, are testing the boundary between training (which the Bartz v. Anthropic ruling in June 2025 found likely fair use) and storing pirated material in a corpus (which the same ruling said was not). The US Copyright Office's 2023 guidance, "purely AI-generated" material is not copyrightable, left a wide grey zone for human–AI collaboration that is still being litigated case by case. The picture is jurisdictionally inconsistent and will remain so for years.

What you should take away

Generative tools are now a working part of illustration, music demo production, copywriting and video pre-visualisation; the central skill is editorial direction rather than originating prose, pixels or sound.
The image stack is settled around Midjourney, the Stable Diffusion / Flux open-weights ecosystem, DALL-E 3, Firefly and Imagen; the music stack around Suno, Udio, Stable Audio, MusicGen and ElevenLabs; the video stack around Sora, Veo, Runway, Pika, Luma and Kling.
The reflective uses of language models, summarisation, comparison, extraction, gain more reliable user acceptance than the original-generation uses, and writing workflows have settled accordingly.
The 2023 WGA and SAG-AFTRA strikes converted AI from an unspoken threat into a contract clause; voice actors and stock-photo contributors, who had no comparable bargaining table, have absorbed more of the displacement.
The legal status of training data is unresolved; the Bartz v. Anthropic split, training plausibly fair use, pirated corpora not, is the current centre of gravity, but appellate rulings over the next two to three years will redraw the map.