VMTech
+381 11 4150 20024/7 Discuss a project
← All Instagram insights VMTECH · INSTAGRAM

Where did 'goblins' in OpenAI models come from: a lesson on rewards

Откуда в моделях OpenAI взялись «гоблины»: урок о наградах

Friends, a note from the OpenAI ecosystem: the team discovered a lexical 'tick'—frequent mentions of 'goblins' in model outputs.

What happened: mentions of 'goblins' and similar creatures rose with GPT‑5.1.

Cause: the 'Nerdy' persona's training granted higher rewards for metaphors involving 'creatures', and this behavior generalized via RL/SFT.

Actions: removed 'Nerdy', adjusted reward signals, filtered data with 'creature-words', added Codex instructions, and expanded audit tools.

Why it matters: demonstrates how subtle reward signals create unexpected ticks and the need for rapid model audits.

What monitoring mechanisms would you propose to detect such effects early?

#OpenAI #AI #ML #NLP

Latest comments

No comments yet.