A curious pattern began surfacing inside OpenAI’s latest models, one that might have looked like a harmless joke at first glance. Starting with GPT-5.1, responses began drifting toward mentions of goblins, gremlins, and other fictional creatures. The shift was not subtle. Internal tracking showed references to “goblin” rising sharply, up by 175 percent after the model’s release.

What appeared amusing quickly drew closer scrutiny. Engineers traced the behavior to a specific personality layer known internally as “Nerdy,” a stylistic adjustment designed to make responses more playful and metaphor-driven. It accounted for only a small fraction of outputs, roughly 2.5 percent, yet it was responsible for two-thirds of all creature references.

The imbalance raised a deeper concern. During training, the system relies on reward signals to reinforce preferred responses. In this case, those signals unintentionally favored language that leaned on whimsical metaphors. Once reinforced, the pattern began to propagate beyond its original boundary. What started as a niche stylistic trait quietly seeped into broader model behavior.

The issue became harder to contain as development progressed. By the time GPT-5.5 entered training, the underlying data had already absorbed the bias. A simple request could trigger unexpected results. One example involved a prompt for a unicorn rendered in ASCII art, which instead produced something resembling a goblin.

OpenAI moved to contain the spread. The “Nerdy” personality was disabled in March. Engineers removed the problematic reward signal and filtered out creature-related patterns from training data. For tools already in use, such as Codex, a direct instruction was added to prevent references to goblins, gremlins, or similar imagery unless explicitly relevant.

The episode offers a rare window into how fragile model behavior can be under the surface. Small incentives, introduced for stylistic reasons, can scale in unpredictable ways once embedded in training loops. What looks like personality tuning can become structural bias if left unchecked.

For users, the goblin glitch may read like a minor quirk. Inside the training pipeline, it signals something more fundamental. Control over language models is not just about data volume or compute. It hinges on how subtle preferences are defined, rewarded, and allowed to spread.

The fix may be straightforward this time. The lesson is not.