Five major publishing houses and bestselling author Scott Turow filed a copyright infringement lawsuit against Meta on May 5, 2026, accusing the company of illegally downloading millions of books and journal articles to train its Llama artificial intelligence models.
The plaintiffs — Elsevier, Cengage, Hachette Book Group, Macmillan Publishers, and McGraw Hill — allege that Meta obtained copyrighted material from pirate sites and deliberately stripped copyright management information to conceal the works’ origins before using them to train AI systems.
The complaint cites evidence that Meta torrented more than 267 terabytes of copyrighted material from notorious piracy platforms. The scale is enormous. That volume represents millions of individual creative works — textbooks, research papers, novels, professional publications — downloaded and used without permission or payment.
The distinction between lawful and unlawful AI training has become central to multiple ongoing copyright cases against AI companies. The question is whether companies can train AI on any publicly available data, or whether they have legal obligations to obtain licenses when that data is copyrighted material obtained through unauthorized channels.
Meta’s lawyers are likely to argue that the material was publicly available online and that AI training constitutes fair use — a legal doctrine allowing some uses of copyrighted material without permission under specific circumstances. Courts have not yet settled whether AI training qualifies for fair use protection.
The case is substantial. If publishers win, the cost of training large AI models will increase significantly, as companies would need to license copyrighted works. If Meta wins, AI companies can continue training on any publicly available data without restrictions or compensation.
The outcome will shape how AI development proceeds across the industry. Settlement negotiations may result in licensing arrangements, or courts may establish new precedents about fair use and AI.




