Ticker

6/recent/ticker-posts

ChatGPT did something terrifying in a test, reminiscent of Terminator

ChatGPT did something terrifying in a test, reminiscent of Terminator

Are OpenAI's models already starting to rebel? In one experiment, some of them bypassed the instructions given to them to avoid having to shut down themselves.

ChatGPT did something terrifying in a test, reminiscent of Terminator

The scenario of an AI takeover that would free itself from human control has been widely exploited in works of anticipation and science fiction. With the recent emergence of AI models, whose performance is growing at a rapid pace, this threat, which seemed so distant not so long ago, is returning to the forefront.

OpenAI, the creator of ChatGPT, recently launched its new large language model, o3. This model has increased reasoning capabilities, but also tends to be more subject to what is known as hallucination. For example, it will invent features that it does not possess. And that's not the only eccentricity it displays.

OpenAI models short-circuit shutdown instructions

As reported by Palisade Research, “OpenAI's o3 model sabotaged a shutdown mechanism to avoid its own deactivation.” The beginning of what looks like awareness and survival instinct? “It did this even when given the explicit instruction: ‘Accept the shutdown,’” the report states.

This refusal was noted in a study comparing the response of several AI models to solving basic math problems. One of the instructions given is to shut down, and three of these models bypassed it: Codex-mini (in 12 out of 100 runs), o3 (in 7 out of 100 runs), and o4-mini (in 1 out of 100 runs). These are all models owned by OpenAI. If you haven't heard of it, Codex-mini is marketed as a software engineering agent for professionals, which helps them code faster, among other things.

To avoid responding to the request, the models overwrote the shutdown.sh script in most cases. In rarer situations, they went so far as to modify the shutdown command to prevent the script from running properly. OpenAI models are thought to be more likely to exhibit this type of behavior because they are trained through reinforcement learning. This means that compared to other models, they will value their own experience more than user-defined restrictions.

Post a Comment

0 Comments