Nvidia has formally requested the dismissal of a lawsuit from authors who allege the company used pirated books to train its artificial intelligence models. In a motion filed in a California federal court, the tech giant argues that the plaintiffs have failed to provide specific evidence that their copyrighted works were actually copied or used in the development of its AI systems.
The case, known as Nazemian v Nvidia, centers on claims that the company utilized vast datasets, including content allegedly sourced from shadow libraries like 'Anna's Archive,' to build its AI tools. Nvidia's legal team contends the authors' claims are speculative and lack the basic elements required for a copyright infringement case to proceed.
Key Takeaways
- Nvidia has filed a motion to dismiss a copyright infringement lawsuit brought by a group of authors.
- The company argues the plaintiffs have not shown concrete proof that their specific books were used to train Nvidia's AI models.
- The lawsuit alleges Nvidia used data from sources like 'Anna's Archive' and the Books3 dataset.
- Nvidia's defense claims that internal discussions about data sources do not constitute an act of copyright infringement.
The Core of Nvidia's Legal Argument
In its motion to dismiss, filed on January 29, Nvidia asserts that the lawsuit fails to meet fundamental legal standards. The company's lawyers state that the complaint does not specify which of the authors' books were copied, when the alleged copying occurred, or which Nvidia AI models contain the supposed infringing material.
Without these crucial details, Nvidia claims the entire case is built on conjecture. The company is pushing back against what it describes as an attempt by the plaintiffs to use the legal discovery process to find evidence of infringement, rather than presenting a plausible claim from the outset.
"[The plaintiffs] do not allege facts showing that Nvidia copied any of their copyrighted works, when any such copying occurred, or which Nvidia models supposedly contain those works," states the company's motion filed with the court.
This position highlights a central challenge in AI training litigation: proving that a specific piece of copyrighted material was ingested and used by a large language model among trillions of other data points.
Addressing the 'Anna's Archive' Allegation
A key piece of the plaintiffs' amended complaint points to internal discussions among Nvidia employees about accessing Anna's Archive, a well-known online shadow library. The authors argue this demonstrates the company's intent and action to use pirated materials.
Nvidia counters this by stating that merely discussing or evaluating a potential data source is not the same as illegally copying copyrighted content. The company's filing emphasizes that the complaint does not allege that Nvidia successfully downloaded or used any of the plaintiffs' books from the site.
What is a Motion to Dismiss?
In the U.S. legal system, a motion to dismiss is a formal request by a defendant for a court to throw out a lawsuit. The argument is typically that the plaintiff has failed to state a legally valid claim, even if all the facts they allege are true. It is a common early-stage tactic in civil litigation.
According to Nvidia, it is just as plausible that the company evaluated the source and decided against using it. The defense maintains that copyright law requires concrete allegations of reproduction, not just conversations about potential sources.
Expanding the Scope of the Lawsuit
The authors' revised complaint also broadened the case to include additional AI models and datasets, such as Megatron 345M and The Pile. Nvidia is challenging this expansion, arguing the plaintiffs are improperly grouping multiple distinct products together without providing specific infringement claims for each one.
Nvidia's legal team has also pointed to its own public documentation, suggesting that information about its training data is available and, in some cases, contradicts the assumptions made by the plaintiffs in their lawsuit.
The Scale of AI Training Data
Modern large language models are trained on immense datasets. For example, 'The Pile,' one of the datasets mentioned in the lawsuit, is a publicly available 825-gigabyte collection of text data sourced from 22 different smaller datasets, including academic papers, books, and web content.
This part of the dispute underscores the complexity of tracing data lineage in AI development. Companies often use a mixture of proprietary, licensed, and publicly available data, making it difficult to pinpoint the origin of every piece of information used in training.
Secondary Liability and Tooling
The lawsuit also introduces a claim of secondary liability against Nvidia. The authors argue that by providing its NeMo Megatron framework, which allows users to download and use large datasets like The Pile, Nvidia is contributing to and facilitating copyright infringement by third parties.
Nvidia has responded by stating that for a secondary liability claim to be valid, the plaintiffs must first prove a direct act of copyright infringement by a user of the framework. The company argues that providing optional tools does not make it liable for how those tools might be used by others, especially without specific allegations of infringement by those users.
This argument touches on a broader legal question in the tech world: to what extent is a company responsible for the actions of users of its software and platforms?
What Comes Next
The legal battle is far from over. The motion to dismiss represents Nvidia's initial defensive strategy to narrow the scope of the case or have it thrown out entirely before it proceeds to the costly discovery phase.
A hearing on the motion has been scheduled for April 2, 2026, in the U.S. District Court for the Northern District of California. The court's decision will be a significant indicator of how the legal system is approaching the novel challenges of copyright in the age of generative AI.
Regardless of the outcome of this motion, the case is one of several high-profile legal challenges facing AI developers. Authors, artists, and media companies are increasingly turning to the courts to determine the legal boundaries for using copyrighted material to train artificial intelligence systems, setting the stage for landmark rulings that could shape the future of the industry.





