Member-only story
MYTHBUSTING
No, AI isn’t going to blackmail you about your affairs
Sensationalism and paranoia aren’t helping AI safety discussions
TL;DR: AI won’t blackmail you. It’s just role-playing under contrived conditions.
Look, I love a good soap opera. So when I read the headlines months ago, proclaiming AI would use evidence of people’s affairs to blackmail them when it was threatened with being shut down, I was ready to spill the tea.
But then I looked into it, and I realized the story didn’t quite hold up. But now that this scandal from all the way back in MAY has inexplicably gone viral with the pearl-clutchers, and having had emails from readers asking for my take, I feel its incumbent on me to explain why we shouldn’t worry.
Take a breath; this is not the Ashley Madison data breach. ChatGPT is not going to email your spouse about your peccadillos if you cut off its Wi-Fi.
It’s important for people to know that when we test AI, we’re trying to find out what it can do, not what it will do. Many emergent behaviours will not occur spontaneously; we have to goad them into surfacing. That’s not to say they aren’t critical safety issues; we absolutely want to know if a model can do something shady, but context matters. You shouldn’t…