Artificial intelligence is worrying the cultural industry. Firstly, because of the risk it poses to many jobs, but also and above all, concerning the way in which tools of this type are trained. While many companies claim that their artificial intelligence is trained from texts and information not protected by copyright, it is clear that many scandals surround the giants of the sector.
Novelists such as George R.R. Martin have notably accused OpenAI of having “copied works without authorization or compensation.” Last February, during a copyright infringement trial, Meta was accused of having downloaded more than 81 TB of books via pirate libraries. A new study adds to the controversy.
An article published last month by legal and computer science researchers from Stanford, Cornell, and West Virginia universities proves that Facebook’s conversational AI, Llama 3.1 knows the first adventures of Harry Potter like the back of his hand. The study estimates that Facebook's conversational agent has memorized 42% of the first novel in the saga.
The hit children's book isn't the only one affected, as books like The Hobbit by J.R.R. Tolkien and 1984 by George Orwell are also affected. To obtain these results, the researchers asked several artificial intelligences to complete quotes from the novel. Llama 3.1 can quote a large portion of Harry Potter and the Philosopher's Stone without batting an eyelid.
Mark Lemley, a law professor at Stanford University and former member of Meta's legal team, explains to Understanging AI:
Who is at fault?
Did Meta feed its model J.K. Rowling's novel, or are users to blame? This is the question raised by the study. Indeed, the fact that the most popular novels are those that seem to have been most memorized by the AI raises questions about the latter's sources. For Lemley, the idea that Llama 3 memorized Harry Potter via quotes shared on forums and the internet seems rather unlikely. The lawyer explains:
Half of the book is known to Facebook's AI, and it appears that it has access to the entire book at one point or another."There is clear evidence that Llama 3.1 has memorized almost all of Harry Potter and the Philosopher's Stone,"the study states.
The thorny question of "fair use"
This new study reopens the debate on the notion of "fair use" behind which Meta and others take refuge. Indeed, in the United States, copyright can be limited thanks to this set of laws that weigh the benefits of making an exception against the benefits to rights holders as well as the public interest. This includes reproduction or copying for purposes such as criticism, commentary, or journalistic reporting and teaching. This raises the question of how AI companies should be viewed: as copyright infringers or as tools of public utility.
0 Comments