Newly unredacted court documents allege that Mark Zuckerberg’s Meta secretly used a notorious piracy database to train its AI models. According to plaintiffs, Zuckerberg and Meta used the wide availability of pirated works to be a “get-out-of-jail-free card” during product development work on artificial intelligence.
Wired reports that the plaintiffs in a copyright case against Meta have accused Mark Zuckerberg’s company of using Library Genesis (LibGen), a shadow library of pirated books originating from Russia, to help train its generative AI language models without permission from the copyright holders. This revelation came to light after Judge Vince Chhabria of the United States District Court for the Northern District of California ordered Meta and the plaintiffs to file full versions of previously redacted documents, stating that Meta’s approach to redacting them was “preposterous” and aimed at avoiding negative publicity rather than protecting business interests.
The unredacted documents contain internal exchanges between Meta employees discussing the use of LibGen data. In one instance, a Meta engineer expressed hesitation about accessing LibGen data from a corporate laptop, stating that it “doesn’t feel right.” The documents also allege that discussions about using LibGen data were escalated to Meta CEO Mark Zuckerberg and that the AI team was ultimately “approved to use” the pirated material.
Meta has maintained that using publicly available materials to train AI tools is protected under the “fair use” doctrine. However, the plaintiffs argue that Meta knew LibGen was a pirated dataset and used it anyway. They claim that Meta treated the “public availability” of shadow datasets as a “get-out-of-jail-free card” despite internal records showing that decision-makers, including Zuckerberg, were aware of the pirated nature of LibGen.
The plaintiffs are seeking to amend their complaint based on the newly revealed information. They allege that Meta not only used copyrighted material without permission but also disseminated it by uploading and seeding pirated files containing the plaintiffs’ works on torrent sites.
LibGen, one of the largest and most controversial shadow libraries in the world, has faced legal challenges in the past. In 2015, a New York judge ordered a preliminary injunction against the site, but its anonymous administrators switched domains to keep it running. In September 2024, another New York judge ordered LibGen to pay $30 million to rights holders for copyright infringement.
The outcome of this case, along with similar lawsuits working their way through U.S. courts, will have significant implications for the future of AI and copyright law. It will determine whether technology companies can legally use creative works to train AI models and could either entrench or derail the most powerful players in the AI industry.
The case is Kadrey et al. v. Meta Platforms in the United States District Court for the Northern District of California.
Read more at Wired here.
Lucas Nolan is a reporter for Breitbart News covering issues of free speech and online censorship.