OpenAI ChatGPT goes goblin mode — let none say ‘model collapse’

OpenAI released its latest chatbot model, GPT 5.5, in April. It has a habit of talking about goblins. A lot.

One OpenClaw user was using GPT 5.5 and their bot would say things like:  [Twitter, archive]

“helpful minion in a power suit” was taken, so I evolved into goblin mode with calendar access.

Trademark dispute with three raccoons in a trench coat. Legal said “pivot to goblin.”

Another user asked ChatGPT about camera lenses. It offered him “filthy neon sparkle goblin mode.” [Twitter, archive]

OpenAI even put specific instructions into the system prompt for Codex, their AI coding model, to try to get it not to talk about creatures: [GitHub]

Never talk about goblins, gremlins, raccoons, trolls, ogres, pigeons, or other animals or creatures unless it is absolutely and unambiguously relevant to the user’s query.

In fact, OpenAI put in “never talk about goblins” twice.

It’s the usual content for a system prompt, as we saw in the leaked Claude Code source — desperately begging the robot to please, please, don’t screw up this time.

The anti-goblin line was not in the instructions for previous models. So how did GPT 5.5 end up like this?

ChatGPT relies heavily on coming across to the user as an actual person you’re talking to. This sucks you in, so you spend more time with your new best friend — the chatbot. Here’s another part of the new Codex system prompt:

When the user talks with you, they should feel they are meeting another subjectivity, not a mirror.

Try as hard as you can to pretend you’re a person. The odd spot of AI psychosis, or the bot talking people into killing themselves or killing others? Just an unfortunate side effect. Mild AI psychosis? That’s just marketing.

The goblins started showing up in GPT 5.1. OpenAI blames post-training, where you take an existing AI model and try to tweak the model’s outputs: [OpenAI]

training the model for the personality customization feature, in particular the Nerdy personality. We unknowingly gave particularly high rewards for metaphors with creatures.

The “Nerdy” personality was retired — but the goblins leaked through to the rest of the GPT 5.5 model. It’s full of goblins.

The goblin problem looks very like visible signs of model collapse — where you see some weird bit of data increasingly overrepresented in the chatbot output.

OpenAI doesn’t use the words “model collapse” in the explanation post — but model collapse from training the model on the previous model’s output is precisely how they’d end up with the effect they’re describing.

OpenAI trained GPT-3 on literally the whole Internet. Everything since then is going to include added slop — as the web fills with more and more slop.

OpenAI doesn’t have any way to make their models actually reliable. All they have is post-training, yelling in the system prompt, and one-trick workarounds that can count the R’s in “strawberry” but not in “blueberry”.

The only trick Sam Altman has left here is trying to lean into the goblin memes on Twitter. This is fine. [Twitter, archive]

5 Comments

  1. Model collapse? Or a message from the staff of OpenAI, who are in fact goblins and spend much of their time convulsed with hideous laughter at all the idiotic humans they’ve suckered into trusting the chatbot?

    Alas, that would be far too entertaining. No doubt the truth is depressingly mundane. Or banal, as Hannah Arendt might say.

  2. I don’t know if you’ve seen this, but it seems relevant: https://defector.com/you-should-never-be-the-most-sycophantic-participant-in-a-conversation-with-a-chatbot

    “You can’t make an AI avoid mistakes by simply telling it not to make mistakes; its propensity for mistakes wasn’t based on some misapprehension that mistakes were OK, or in obedience to some affirmative directive to make a certain number of mistakes. […] If it could be made infallible by simply telling it to be infallible, its creators would have coded “be infallible” into its programming. If technology could be instructed to simply switch off its own capacity for failure, the world would be a profoundly different place.”

    It seems you also can’t make an AI avoid goblins by simply telling it not to talk about goblins.

    • Seems like the chatbot should present a custom prompt like that as a reminder/caveat to the user.
      “Reminder: Here are all the things you want in a response, this machine can’t understand these instructions or evaluate its performance, so it is up to you, as an intelligent entity, to check that the output has the desired properties.”
      Then spit back their prompt at them.

      Do they do any A-B testing on these prompts? (If you tell it not to talk about goblins, does it change anything? How?)

    • “It seems you also can’t make an AI avoid goblins by simply telling it not to talk about goblins.”

      That would be an inverse violation of Göblin’s Incompleteness Theory.

Leave a Reply

Your email address will not be published. Required fields are marked *

Comment moderation is enabled. Your comment may take some time to appear.