Every day, new tools based on machine learning appear, and more and more people are starting to use them. These AI models have become an important part of our technological ecosystem, for better or for worse… and yet, we still struggle to understand how they work behind the scenes. The CEO of Anthropic, the company behind the Claude LLM, has given himself two years to solve this generational problem.
In an essay published on his blog, entitled “The Urgency of Interpretability,” Dario Amodei begins by reminding us that AI models now occupy an important place at several levels of our society, for better and for worse. This technology is progressing at a remarkable speed today, and today's models are capable of feats that were still science fiction just ten years ago.
The Black Box of AI
But this spectacular rise in power tends to obscure a factor that often flies under the radar of the general public: the problem of interpretability, commonly referred to as the “black box of AI”.
Indeed, the artificial neural networks on which these tools are based are tremendously abstract entities. We know that we can feed data to a model to train it, get a result at the output through the inference process… but everything that happens in between tends to be far too labyrinthine and nebulous for humans to understand.
“When a generative AI system does something, like summarizing a financial document, we have no idea, at any specific or precise level, why it makes the choices it does—why it chooses certain words over others, or why it sometimes makes a mistake when it’s usually accurate,” Amodei summarizes.
“People outside the field are often surprised and alarmed to learn that we don’t understand how our own creations work,” he adds. This astonishment is perfectly understandable: after all, it is the first time in the history of our civilization that such a poorly understood technology has occupied such an important place in society.
Social, technological and commercial issues
This situation raises a host of rather uncomfortable questions, particularly on the subject of security. This is especially true in the current context, where several major industry players are now focused on creating artificial general intelligence with knowledge and reasoning abilities far superior to those of any real person.
Many experts, including Amodei, believe that it would be very unwise to deploy such systems before finding a way to truly understand how they work. “We could have AI systems equivalent to an entire country of geniuses gathered in a data center as early as 2026 or 2027. I am very concerned about the idea of deploying such systems without better control over interpretability,” he explains in his essay.
He also adds that this security dimension is not the only argument that should push AI players to tackle the black box problem. For him, it is not just a question of security: this approach could also lead to significant commercial advantages. In essence, the first entities to decipher how their creations work will also be in the best position to push the limits of the technology—for example, by completely eliminating hallucinations, those cases where LLMs completely lose it and start spouting aberrant or factually incorrect answers.
The industry is getting on board
For all these reasons, Amodei explains that the quest for interpretability should now be a top priority for the entire industry, and even the scientific community in general. "Interpretability gets less attention than the constant deluge of model publications, but it is arguably more important," he believes. "AI researchers in businesses, universities, or nonprofit organizations can accelerate interpretability by working directly on it."
The good news is that some organizations are already doing very serious work on this issue. A significant portion of current research in this area is already aimed at creating what Amodei describes as "extremely precise and accurate MRIs that would fully reveal the inner workings of an AI model." Some very promising progress has already begun to emerge, for example from DeepMind. At the end of 2023, the Nobel Prize-winning company for its work on protein structure presented FunSearch, a model based on an evolutionary architecture that allows it to describe how it arrived at a given solution.
AlphaFold: here's why the protein folding database is revolutionary
Anthropic, for its part, is also investing in this process. Last March, for example, the firm published a very interesting body of research on the “biology of large language models”. This work has highlighted the existence of a handful of “circuits” that could help track the LLMs’ reasoning. At the same time, she has also invested in a startup that is working specifically on model interpretability.
Amodei hopes this approach will make it possible to “reliably detect most model problems” by 2027, the date by which companies like OpenAI hope to reach the stage of general artificial intelligence. It will therefore be very interesting to follow all this work on interpretability, as it is undoubtedly a major sticking point that will largely determine the trajectory of this transformative technology in the coming years. See you in two years for a new assessment.
0 Comments