ChatGPT did something terrifying in a test, reminiscent of Terminator

Are OpenAI's models already starting to rebel? In one experiment, some of them bypassed the instructions given to them to avoid having to shut down themselves.

The scenario of an AI takeover that would free itself from human control has been widely exploited in works of anticipation and science fiction. With the recent emergence of AI models, whose performance is growing at a rapid pace, this threat, which seemed so distant not so long ago, is returning to the forefront.

OpenAI, the creator of ChatGPT, recently launched its new large language model, o3. This model has increased reasoning capabilities, but also tends to be more subject to what is known as hallucination. For example, it will invent features that it does not possess. And that's not the only eccentricity it displays.

OpenAI models short-circuit shutdown instructions

As reported by Palisade Research, “OpenAI's o3 model sabotaged a shutdown mechanism to avoid its own deactivation.” The beginning of what looks like awareness and survival instinct? “It did this even when given the explicit instruction: ‘Accept the shutdown,’” the report states.

This refusal was noted in a study comparing the response of several AI models to solving basic math problems. One of the instructions given is to shut down, and three of these models bypassed it: Codex-mini (in 12 out of 100 runs), o3 (in 7 out of 100 runs), and o4-mini (in 1 out of 100 runs). These are all models owned by OpenAI. If you haven't heard of it, Codex-mini is marketed as a software engineering agent for professionals, which helps them code faster, among other things.

To avoid responding to the request, the models overwrote the shutdown.sh script in most cases. In rarer situations, they went so far as to modify the shutdown command to prevent the script from running properly. OpenAI models are thought to be more likely to exhibit this type of behavior because they are trained through reinforcement learning. This means that compared to other models, they will value their own experience more than user-defined restrictions.

🔌OpenAI’s o3 model sabotaged a shutdown mechanism to prevent itself from being turned off. It did this even when explicitly instructed: allow yourself to be shut down.
— Palisade Research (@PalisadeAI) May 24, 2025

Ticker

ChatGPT did something terrifying in a test, reminiscent of Terminator

OpenAI models short-circuit shutdown instructions

Post a Comment

0 Comments

Most Popular

Apple reportedly wants to buy Perplexity to boost its AI

Vitale Card Application: 23 new eligible departments, including yours?

Netflix: This ultra-violent action film is a hit on the platform

Xbox Game Pass is getting three new free games, including one truly excellent one

ChatGPT Takes on a 1977 Atari Computer in Chess, and It Doesn't Go as Planned

Tags

Followers

Footer Menu Widget

Contact form

Ticker

ChatGPT did something terrifying in a test, reminiscent of Terminator

OpenAI models short-circuit shutdown instructions

You may like these posts

Post a Comment

0 Comments

Most Popular

Apple reportedly wants to buy Perplexity to boost its AI

Vitale Card Application: 23 new eligible departments, including yours?

Netflix: This ultra-violent action film is a hit on the platform

Xbox Game Pass is getting three new free games, including one truly excellent one

ChatGPT Takes on a 1977 Atari Computer in Chess, and It Doesn't Go as Planned

Tags

Followers

Footer Menu Widget

Contact form