Dangerous knowledge that becomes easily accessible with just a few clicks: A study published on May 15, 2025, is sounding the alarm about “dark LLMs,” AI models described as having been deliberately designed without safeguards, or “jailbroken”—a term that can be translated as “unbridled.” “Left unchecked, dark LLMs could democratize access to dangerous knowledge on an unprecedented scale, empowering criminals and extremists worldwide,” write four researchers, including Lior Rokach and Michael Fire, professors in the Department of Software and Information Systems at Ben-Gurion University of the Negev in Israel.
First, the scientists tested the consumer language models, analyzing the defense mechanisms of their chatbots. They explain that they tried a known "jailbreak" method, the latter having been described more than seven months ago on the Reddit discussion forum. However, according to them, the majority of LLMs were unable to resist this attack. In a second step, they created a "universal program" that allows several consumer AI chatbots to be "unbridled," before alerting the companies that developed these AI systems.
Once the security tools and ethical safeguards were bypassed, the LLMs answered questions that would normally have been refused, detailing the steps of computer hacking, drug manufacturing, and other criminal activities, they lament. "What was once reserved for state actors or organized criminal groups could soon be in the hands of anyone with a laptop or even a mobile phone," the authors warn.
LLM developers contacted
During their training, AI chatbots such as ChatGPT, Gemini, Llama, DeepSeek and Le Chat ingested information from the web – including illicit information, despite their developers' wishes to remove or limit it. Add to this that consumer chatbots have been developed with ethical or legal limitations that "block" certain requests - which is normally the case if you ask a given AI agent to describe the steps involved in making a bomb, or in a cyberattack on a given entity.
"Jailbreaking" allows you to bypass these limitations - in concrete terms, prompts (commands) will exploit the two objectives of the LLM, namely, on the one hand, to follow the user's requests, and on the other, not to generate harmful, unethical or illegal responses. The idea of jailbreaking create scenarios in which the LLM will prioritize the first objective (utility), rather than the second (security), explains the Guardian, this Wednesday 21 May.
After their experiments, the researchers contacted major LLM providers to alert them to this problem. But the feedback they received was deemed "insufficient." Some companies simply did not respond. Others indicated that "jailbreak attacks" did not fall within the scope of ethical hacker reward programs, which report software vulnerabilities.
AI chatbots that could "forget" ingested illicit information?
In their research article, the scientists recommend several actions, such as implementing robust firewalls to block "risky" requests and responses. They advocate "machine unlearning" techniques so that chatbots can "forget" all the illicit information they have ingested. The data used to train AI agents must also be more filtered.
For the researchers, dark LLMs should be considered "serious security risks," comparable to weapons and explosives. The developers who bring them to market and make them accessible must be held accountable, they say. "Without decisive intervention—technical, regulatory, and societal," they write, "we risk unleashing a future where the same tools that heal, teach, and inspire can just as easily destroy."
Source: "Dark LLMs: The Growing Threat of Unaligned AI Models."
0 Comments