Hachette Book Group and Cengage Group asked a California federal court on Thursday to intervene in a class action accusing Google of using unauthorized books to train its Gemini AI. According to the complaint, the publishers say Google copied whole works instead of licensing them.
The filing alleges Google downloaded books from pirate sites such as Z-Library, OceanofPDF, WeLib and b-ok.org. The publishers claim Google then copied those works repeatedly into memory and training sets for successive models, and it “chose to steal a massive body of content from Plaintiffs and the Class to train its AI model” and infringed “at every stage.”
The complaint says Google’s C4 dataset includes material from at least 28 piracy-linked sites. It notes that “The copyright symbol (©) appears more than 200 million times in the C4 dataset.” (Ed. note: that number underlines the scale alleged.)
Publishers allege Gemini now produces outputs that “substitute for copyrighted works,” including verbatim reproductions, detailed summaries, and “knockoffs that copy creative elements of original works.” They seek statutory damages, injunctions to stop further use, destruction of unauthorized copies, and disclosure of which books trained Gemini.
The motion to intervene ties into a consolidated case first filed by authors in 2023; see the consolidated docket. The complaint also quotes Common Crawl responding, “You shouldn’t have put your content on the internet if you didn’t want it to be on the internet.”

