California Court Accuses Anthropic of Massive Piracy in Billion-Dollar Lawsuit

A federal court in California is currently reviewing a billion-dollar class action lawsuit against Anthropic, the company behind the language model Claude, for alleged large-scale copyright infringement.

The lawsuit claims that between 2021 and 2022, Anthropic downloaded up to seven million books from piracy sites such as LibGen and PiLiMi. This situation poses a significant financial threat to the company, even after a partial victory regarding fair use just weeks ago.

According to a court ruling issued on July 17, 2025, Anthropic is accused of utilizing the BitTorrent protocol to download pirated books from LibGen and PiLiMi. These files, typically in .epub, .pdf, or .txt formats, were stored in a central internal database, regardless of whether they were later used for training AI models.

Judge William Alsup described the company’s actions as «downloading millions of works in a Napster-like manner.» The ruling details that from January 2021 to July 2022, one of Anthropic’s co-founders initially downloaded around 200,000 books from the Books3 collection, followed by approximately five million from LibGen and an additional two million from PiLiMi, focusing on books not yet available on LibGen.

The court has determined that the case should proceed as a class action due to the volume and complexity of the evidence. Only works sourced from LibGen and PiLiMi are included, as Books3 was excluded for lack of metadata.

The financial implications for Anthropic are substantial. Under U.S. law, damages for willful copyright infringement can reach up to $150,000 per work.

Anthropic is required to provide a complete list of metadata for its LibGen and PiLiMi downloads by August 1, 2025, while the plaintiffs must submit a detailed inventory of titles and registrations by September 1, 2025.

In June, the same court determined that training AI models on legally obtained books could be considered fair use, especially if the usage is «transformative» and copies are not distributed. However, the court also indicated that storing pirated works in an internal library does not qualify as fair use.

While the legal status of mass web scraping and the use of publicly available data for AI training remains unclear, the court’s decision establishes a clear boundary: pirated content cannot be deemed fair use, even for research or innovative purposes in AI.

The Anthropic case could set an important precedent for the industry, illustrating that AI companies cannot sidestep copyright laws when sourcing training data, regardless of how they intend to use it. This ruling may also impact ongoing legal disputes involving Meta*, OpenAI, and other firms accused of utilizing copyrighted materials to train language models.

*Meta and its products (Instagram, Facebook) are banned in the Russian Federation.

This news was sourced from [here](https://the-decoder.com/napster-style-piracy-allegations-put-anthropic-at-risk-of-a-billion-dollar-class-action-lawsuit/).