The creators of a revolutionary AI system that can write news stories and works of fiction – dubbed “deepfakes for text” – have taken the unusual step of not releasing their research publicly, for fear of potential misuse.
OpenAI, an nonprofit research company backed by Elon Musk, Reid Hoffman, Sam Altman, and others, says its new AI model, called GPT2 is so good and the risk of malicious use so high that it is breaking from its normal practice of releasing the full research to the public in order to allow more time to discuss the ramifications of the technological breakthrough.
At its core, GPT2 is a text generator. The AI system is fed text, anything from a few words to a whole page, and asked to write the next few sentences based on its predictions of what should come next. The system is pushing the boundaries of what was thought possible, both in terms of the quality of the output, and the wide variety of potential uses.
When used to simply generate new text, GPT2 is capable of writing plausible passages that match what it is given in both style and subject. It rarely shows any of the quirks that mark out previous AI systems, such as forgetting what it is writing about midway through a paragraph, or mangling the syntax of long sentences.
Feed it the opening line of George Orwell’s Nineteen Eighty-Four – “It was a bright cold day in April, and the clocks were striking thirteen” – and the system recognises the vaguely futuristic tone and the novelistic style, and continues with:
“I was in my car on my way to a new job in Seattle. I put the gas in, put the key in, and then I let it run. I just imagined what the day would be like. A hundred years from now. In 2045, I was a teacher in some school in a poor part of rural China. I started with Chinese history and history of science.”
Feed it the first few paragraphs of a Guardian story about Brexit, and its output is plausible newspaper prose, replete with “quotes” from Jeremy Corbyn, mentions of the Irish border, and answers from the prime minister’s spokesman.
One such, completely artificial, paragraph reads: “Asked to clarify the reports, a spokesman for May said: ‘The PM has made it absolutely clear her intention is to leave the EU as quickly as is possible and that will be under her negotiating mandate as confirmed in the Queen’s speech last week.’”
From a research standpoint, GPT2 is groundbreaking in two ways. One is its size, says Dario Amodei, OpenAI’s research director. The models “were 12 times bigger, and the dataset was 15 times bigger and much broader” than the previous state-of-the-art AI model. It was trained on a dataset containing about 10m articles, selected by trawling the social news site Reddit for links with more than three votes. The vast collection of text weighed in at 40 GB, enough to store about 35,000 copies of Moby Dick.
The amount of data GPT2 was trained on directly affected its quality, giving it more knowledge of how to understand written text. It also led to the second breakthrough. GPT2 is far more general purpose than previous text models. By structuring the text that is input, it can perform tasks including translation and summarisation, and pass simple reading comprehension tests, often performing as well or better than other AIs that have been built specifically for those tasks.
That quality, however, has also led OpenAI to go against its remit of pushing AI forward and keep GPT2 behind closed doors for the immediate future while it assesses what malicious users might be able to do with it. “We need to perform experimentation to find out what they can and can’t do,” said Jack Clark, the charity’s head of policy. “If you can’t anticipate all the abilities of a model, you have to prod it to see what it can do. There are many more people than us who are better at thinking what it can do maliciously.”
To show what that means, OpenAI made one version of GPT2 with a few modest tweaks that can be used to generate infinite positive – or negative – reviews of products. Spam and fake news are two other obvious potential downsides, as is the AI’s unfiltered nature . As it is trained on the internet, it is not hard to encourage it to generate bigoted text, conspiracy theories and so on.
Instead, the goal is to show what is possible to prepare the world for what will be mainstream in a year or two’s time. “I have a term for this. The escalator from hell,” Clark said. “It’s always bringing the technology down in cost and down in price. The rules by which you can control technology have fundamentally changed.
“We’re not saying we know the right thing to do here, we’re not laying down the line and saying ‘this is the way’ … We are trying to develop more rigorous thinking here. We’re trying to build the road as we travel across it.”
As 2019 begins…
… we’re asking readers to make a new year contribution in support of The Guardian’s independent journalism. More people are reading our independent, investigative reporting than ever but advertising revenues across the media are falling fast. And unlike many news organisations, we haven’t put up a paywall – we want to keep our reporting as open as we can. So you can see why we need to ask for your help.
The Guardian is editorially independent, meaning we set our own agenda. Our journalism is free from commercial bias and not influenced by billionaire owners, politicians or shareholders. No one edits our editor. No one steers our opinion. This is important as it enables us to give a voice to those less heard, challenge the powerful and hold them to account. It’s what makes us different to so many others in the media, at a time when factual, honest reporting is critical.
Please make a new year contribution today to help us deliver the independent journalism the world needs for 2019 and beyond. Support The Guardian from as little as $1 – and it only takes a minute. Thank you.
View all comments >
comments (572)
Sign in or create your Guardian account to join the discussion.
There are at least two potential ways to release something as scary-sounding as this:
a) privately, so that only those close to the original designers can access it. This is liable to creat a new, entirely unaccountable elite able to manipulate information in whichever way they choose.
Or b) to everyone at once, by e.g. Creative Commons license. This is liable to swamp our culture with a mass of competing (and variously plausible) fakery.
I'm mor…
The problem isn't that there is a machine that can generate and spread plausible but fraudulent stories. There are plenty of humans doing that already. An intelligence, whether artificial or human, may add to the noise but it may also be able to provide insights that no one else saw. The problem is that people do not have the tools and we do not have an infrastructure to differentiate truth from fiction anymore. Free democratic government has an…
One of the scariest articles ever. The implications are extraordinary. It would become impossible to ever disentangle any thing resembling a fact and no sane decisions could ever be taken. Civilisation would be without a mooring and we'd live in - but not for long because all decision-msking systems would collapse - in a totally magical world divorced from reason.
The sophistication of the Orwelliian piece is startling though I wonder how long -…
What is not told here is what would be potential benefits of such a tool. I mean: why develop this capabilities in the first place? Is there any way to use it that would NOT generate fake texts?
Unfortunately, with text generators, it's very easy to cherry-pock some sentences that look very smooth grammatically and semantically and point at them as characteristic- but, are they really? Pretty much every known model that can generate text can eventually produce nice "hits" like that- from humble Markov models to the deepest transformer architectures trained on the largest corpora.
So this means nothing and it's sad to see OpenAI's press r…
You could say, why not train the model on ethics? There are ethics codes in most fields of human endeavour.
In the end, that only solves part of the problem. You might turn the algorithm into a lawyer, for instance, who observes his ethics code to the letter. But that doesn't mean this lawyer can't work for someone who is planning a crime or an organisatio…
That was also my reaction. However, one of the differences is that AI authoring systems like the one described could very, very quickly produce and publish/broadcast so much fake information that it would swamp unfake information. And it could be done particularly insidiously so that month by month, year by year the truth could be progressively adulterated until it had been completely replaced.
In the short to medium term we might take refuge in…
i get the technological leap. but essentially we are talking about a fake news generator, humans do that perfectly well to date. its not like a 'new' evil is being birthed into the world.
I work in the field, and it's probably genuine. It can be a sobering experience when you produce something in code that apparently sees the future, and you're not entirely sure precisely how it's doing it. Most of my work has been around predicting human behaviour based on online activity, I could have a field day with data on here ��
I suspect a Sinclair Spectrum could generate more coherent content than I read in most online discussions.
that's what happens when you give the general population a possibility to "publish" their opinions while hiding behind relative anonymity.
That's right Noone3
At least Noone put his/her surname on there.... 3.
You missed yours but I'll guess it is 2.
We're all just numbers.
Surely this is better than some of the lifestyle articles about adopting trees that you get on here.
That’s funny, I haven’t come across any of those. But then again, I read the Guardian for news; what do you read it for?
Tim Dowling's hang-dog life, as reported by him weekly, and the constant refrains of BBTB (Bring Back the Benoit).
Looking for coverage — but I get foliage. Bad scene
Is this story true?
no, it was generated by an AI
Can you prove your human?
(btw, my deliberate mistake there was an attempt to prove that I am. Soon the only way we'll be able to do it)
That, I fear, is far too optimistic. AI will mimic our errors in superficial ways, as it's precisely the kind of thing humans do. They just won't make errors in the serious ways.
Well, errors by their logic...
/drumbeatsofdoom
Too late.
Some people would be surprised how many of the discussions they've been having on here or Twitter are with similar AIs.
Does that genuinely happen? I’m not a user of twitter so I wouldn’t know about that particular platform. As to AI being prevalent btl I’m not so sure about that, I just can’t see us being so technologically advanced as to produce machines that could display such convincing displays of ignorance, certainly not in our lifetimes.
‘...I just can’t see us being so technologically advanced as to produce machines that could display such convincing displays of ignorance,’
— you’ve clearly never driven a Ford Focus
Is it too cynical of me to suspect this might be a PR stunt? I.e. "omg this thing we made is so amazing that it would be dangerous to release it to the public" - nothing piques curiosity like the forbidden.
Lol though that it can produce perfect simulacra of Theresa May's speech - no surprise there then
Open AI is a free and open source softeare. There is no motivation for PR whatsoever.
I work in the field, and it's probably genuine. It can be a sobering experience when you produce something in code that apparently sees the future, and you're not entirely sure precisely how it's doing it. Most of my work has been around predicting human behaviour based on online activity, I could have a field day with data on here ��
Wait until we make a robot that can mimic exactly the smooth, human movements of our Theresa dancing. Then we'll know we've cracked it.
No please, do release it, once the word is out that most of the information out there might be fake maybe people will finally stop believing every piece of crap they read on the internet.
And that's the danger. Once you're unable to believe anything you are ripe for exploitation by those who can spot the chink in your armour, ie they will feed you the stuff you want to believe. How do you separate the 'crap' from the true facts in that situation?
If you can't believe anything, facts are irrelevant. Like the effect of the whole "fake news" thing started by Trump but much worse.
That's what's going on in Russia already. Spout endless nonsense so that nobody knows what to believe and eventually just give up caring.
There are definitely a number of deep fakes in parliament.
They're everywhere dude ... everywhere.
My Granda told me an aul fella when he was growing up refused to read the newspaper because he wouldn't read anything "produced by machines".
Never really understood what the aul boy was on about until I read this.
Was at a recording of 'The Infinite Monkey cage' once where they were talking about AI and a boffin from some University or another (A professor in Robotics iirc) said 'it's not Artificial Intelligence we should be worrying about, it's Real Stupidity'
They are one and the same.
What is the point of a text generator? Is it just to cut costs/time with writing reports? Based on the use of it in the article, doesn't seem to have any use for that because they're just creating text from half a sentence. Would it be possible to, for example, insert company data and generate an accurate report in a fraction of the time a human would do it?
Text to speech and speech to text are pretty sophisticated these days, combine it with, if the article is to be believed, an extremely sophisticated text-generating AI and you have created a telemarketing hell for us all.
I get that part. I just don't see the benefit of such a technology. Just a fake news generator.
Imagine if the Grauniad could take your comment history and on-the-fly write articles tailored exactly to your political leanings. They could populate and update comment threads with fake comments and replies all to keep you coming back so they can serve ads to you.
What about if instead of tailoring articles for one person, what if they used comment histories to assign users to groups and tailored articles in such a way as to maximise the click rate and comments and thereby the ad revenue. They could A/B test articles across groups to judge which articles drew more clicks.
You could do either of those things and reduce staffing considerably.
It was the best of times, it was the blurst of times
The only way out of this is to start teaching Klingon in schools.
You stupid monkey!!
It is probably already more accurate than the mouthings and mewlings of the World's Politicians and "Official Spokesmen" and may well be a vast improvement on their "output".
As an author, I have long wished for a keyboard with but one key.
NOVEL
I would push that and my modem would immediately post my new work to the Web. There, it would be read, not by humans, but by auto-readers.
Both author and reader could then find something to keep us busy that is not possible to train a machine to do. I'm not sure what that would be ...
Robot impressions.
Jam, I think you're on to something there. To see what that might look like, watch Vice President Mike Pence standing behind Trump.
A monkey could write that stuff
“The PM made it absolutely clear” was written by a monkey since it is May’s default phrase when she talks bollocks.
And it will spell doom for Russian bots' careers.
They will be using one already.
Covfefe
Or, a generation prior, imagining a future where humans and fish coexisted peacefully, where more and more imports came from overseas...
I would read that version of clockwork orange. Fascinating.
I've just finished an AI online course from a Finnish university. What we are on the threshold of as humans is truly amazing . Some countries are aware and prepared - others are not.
Combine the deepfakes for text with this:
https://www.youtube.com/watch?v=ohmajJTcpNk
and we have some scary shit - not on the horizon- but here and now.
The value of veracity is way beyond monetary.
Can someone let the tory fibbers in on this?
Can someone please explain what, exactly, are 'Conservatives' aiming to 'conserve' ? (other then the monopolisation of wealth and power into fewer hands.)
"And we have some scary shit - not on the horizon- but here and now."
Indeed, and yet here we are amusing ourselves to death with this...
What is this program meant to be for? What is its purpose? The article doesn't say. Can anyone enlighten me?
They made it because they were able to. They can also change voice to text and vice versa.How humans will use it - well, nobody knows.AI evolution is accelerating, our own isn't, our laws and regulations can't keep up (look at the House of Commons ffs).
its simply part of the age old quest for knowledge and the exercise of intellect..curiousity may have killed the cat.. but its a natural human prerogative.
And, through the development of this kind of technology, could end up killing [a great many of] us too
Perhaps it could replace Elon Musks twitter rant
Or, Elon himself!
Is Trump programmed with the random word salad version?
The depressing thing about this AI is that it's taught what and how, but it has no idea why. So we'll all be exterminated by machines just because because they looked at all the possible outcomes and that was most likely to be the desired outcome.
"La tristesse de l'intelligence artificielle est qu'elle est sans artifice, donc sans intelligence"
It generates text. That's it.
Mange-tout, Rodney!
Look, Dave, I can see you’re really upset about this.
I honestly think you ought to sit down calmly, take a stress pill and think things over.
It's simple. Make it illegal for AI to use the expression, "it's true, honestly!". Then real people can simply say "it's true" and we all know where we stand.
Useless.
Trump, Johnson, Rees-Mogg and Co. all claim their lies are true too.
Sign in or create your Guardian account to recommend a comment