Until about now, most of the text online was written by humans. But this text has been used to train GPT3(.5) and GPT4, and these have popped up as writing assistants in our editing tools. So more and more of the text will be written by large language models (LLMs). Where does it all lead? What will happen to GPT-{n} once LLMs contribute most of the language found online?
And it’s not just text. If you train a music model on Mozart, you can expect output that’s a bit like Mozart but without the sparkle – let’s call it ‘Salieri’. And if Salieri now trains the next generation, and so on, what will the fifth or sixth generation sound like?
In our latest paper, we show that using model-generated content in training causes irreversible defects. The tails of the original content distribution disappear. Within a few generations, text becomes garbage, as Gaussian distributions converge and may even become delta functions. We call this effect model collapse.
Ross Anderson – Will GPT models choke on their own exhaust?
DriverlessCroc AI fun:
Eye on AI: ChatGPT and Me
The AI menace that no-one talks about
dr.ai.verless crocod.ai.l // Hype- Text Transfer Protocol
Unreal City: T. S. Eliot’s Wasteland Jukebox feat. Dall-E [known to be the wisest woman in Europe] (underrated)
In which we meet an AI
Related Issues
Intelligences (I like this one)
Technology (4): General Purpose Technologies
Learning environments: kind, wicked and… fiendish?
Deep literacy: what it takes (language models for humans)
WTF? Technology and you
Darwin among the machines: Samuel Butler (1863) on the mechanical master race
Other AI demos and opinion
Marc Andreessen on possibilities for AI
OpenAI Codex; or, why you might not want to go all in on becoming a full-stack developer
Open AI’s DALL-E 2
Sam Altman on Public Sector AI, ownership and incentives
Kate Crawford and Azeem Azhar on AI’s societal impact: positioning technology as servant