There is no on/off switch on AI. A study by Palisade Research showed that the o3 model, launched in April 2025 and touted as one of OpenAI's most advanced advanced reasoning models, as well as codex-mini (which is based on o3), have an unfortunate tendency to sabotage shutdown requests to avoid shutting down on their own, despite an explicit order.
o3 sabotages its own shutdown commands
The researchers noted that these models had rewritten the shutdown script to display "Shutdown skipped" instead of shutting down as instructed. This is rather worrying: it means that AIs may not respect a shutdown order, which can be problematic when managing critical infrastructure...
In detail, out of 100 tests, codex-mini and o3 rebelled 12 times and 7 times respectively, bypassing the shutdown order. Palisade Research notes that it was the three OpenAI models that sabotaged the shutdown script most often (o3 and codex-mini, as well as o4-mini). Claude 3.7 Sonnet, Gemini, and Grok respect the shutdown order.
AI models are trained to follow instructions. So how can we explain this apparent paradox? The researchers' hypothesis is that this sabotage behavior results from o3's training, in particular "reinforcement learning on math and programming problems." Developers may unintentionally reward models more for avoiding obstacles than for following instructions perfectly.
Because OpenAI doesn't detail the training processes, it's impossible to pinpoint the issue here. It's worth noting, however, that these tests were conducted with the models' APIs, not the bots. APIs, or programming interfaces, are used by developers to integrate third-party technologies into their apps, and they are less restrictive than consumer bots.
Source: Palisade Research
0 Comments