Ticker

6/recent/ticker-posts

New versions of ChatGPT are getting more powerful… but their biggest flaw is getting worse too

New versions of ChatGPT are getting more powerful… but their biggest flaw is getting worse too

OpenAI claims to have reached a milestone with its latest artificial intelligence. However, a study reveals an unexpected weakness. The most recent models are said to produce even more errors than their predecessors.

New versions of ChatGPT are getting more powerful… but their biggest flaw is getting worse too

Artificial intelligence is evolving rapidly, but its flaws persist. OpenAI has just published data on its new o3 and o4-mini models, which are supposed to be the most efficient to date. However, these AIs show a marked increase in hallucinations. This phenomenon refers to serious errors where the AI invents facts, studies, or links. This is a worrying problem because this information seems credible to unsuspecting users. Their number remains a key criterion for assessing the reliability of of a model.

The figures are clear. OpenAI indicates that o4-mini hallucinates in 48% of cases tested with the internal PersonQA tool, i.e. three times more than the o1 model. The o3 model, although larger and supposed to be more reliable, also produces errors in 33% of responses, i.e. twice as many as the previous one. This development is surprising because, as a general rule, each new generation of model tends to reduce these problems. Here, despite With progress on overall accuracy, the risk of obtaining false information increases.

o3 and o4-mini models hallucinate more despite their increased reasoning capabilities

OpenAI has designed its recent models to externalize their reasoning, displaying the thinking steps for greater transparency. This approach, while promising, does not prevent the appearance of erroneous information. An independent report by Transluce revealed that o3 sometimes invents capabilities it doesn't have, such as running code on a fictitious MacBook Pro. Worse, even when the user corrects the error, the user persists in their error. This calls into question the real reliability of these tools, which are presented as more rigorous.

Experts put forward several hypotheses to explain this phenomenon. Design choices, such as outcome-based reinforcement, could worsen the hallucinations. In addition, OpenAI has reportedly reduced its security testing phases to accelerate the development of its models. This strategy, while it allows for faster innovation, exposes users to more erroneous content. While waiting for better solutions, caution remains essential: even the most advanced AI must be used with a critical eye.

Post a Comment

0 Comments