Google slapped with a lawsuit for 'secretly stealing' data to train Bard

From the same law firm that's also suing OpenAI for data theft.
By
Cecily Mauran
 on 
scales of justice against a backdrop of binary code
Another lawsuit involving AI and data theft. Credit: Getty Images

A California law firm has filed a class-action lawsuit against Google for "secretly stealing" vast amounts of data from the web to train its AI technologies.

Clarkson Law Firm is suing the tech giant for negligence, invasion of privacy, larceny, copyright infringement, and profiting from personal data that was illegally obtained. "Google has taken all our personal and professional information, our creative and copywritten works, our photographs, and even our emails—virtually the entirety of our digital footprint—and is using it to build commercial Artificial Intelligence ('AI') Products like 'Bard,'" said the complaint, which was filed on July 11 in the Northern District of California.

The lawsuit comes on the heels of Google quietly updating its privacy policy last week, claiming any public information can be used to train its AI products like Bard. Google is essentially saying anything published on the web is fair game, but the law firm believes this is a massive invasion of privacy, by scraping data without compensation or consent for the express reason of training AI models. The lawsuit alleges that Google, a multi-billion dollar company with over a billion users worldwide, is putting users in an "untenable" position: "either use the internet and surrender all your personal and copyrighted information to Google’s insatiable AI models — or avoid the internet entirely."

Mashable Light Speed
Want more out-of-this world tech, space and science stories?
Sign up for Mashable's weekly Light Speed newsletter.
By clicking Sign Me Up, you confirm you are 16+ and agree to our Terms of Use and Privacy Policy.
Thanks for signing up!

In a statement to Reuters, Google general counsel Halimah DeLaine Prado called the claims "baseless," saying, "we use data from public sources — like information published to the open web and public datasets – to train the AI models behind services like Google Translate, responsibly and in line with our AI Principles."

Recently, Clarkson filed a similar class-action lawsuit against OpenAI, the company that created ChatGPT, for "theft and misappropriation of personal data," using the same kind of data-scraping operation. Large language models need huge amounts of data to train AI chatbots and make them conversational and intelligent. Both Bard and ChatGPT rely on large language models to work, which has raised concerns about use of private data as well as copyright infringement.

The most recent lawsuit says Google has misappropriated datasets like the Common Crawl, a non-profit, which makes its data free for research and education purposes, as well as data from sites like Medium, and Kickstarter. Google also uses its own data from Gmail and Google Search to feed its models. Other data scraped includes copyrighted works like e-books in digital libraries, and even from piracy websites, that the company is using without compensating artists and authors.

The key to Clarkson's lawsuit is the issue of public domain. But, "'publicly available' has never meant free to use for any purpose," the complaint said. Yes, some data or available to purchase, but it depends on the context of their use and user consent. Yes, users consent to privacy policies when they publish content on the web, but they have a right to know if it's being used somewhere else. In other words, Clarkson says, "Google must understand, once and for all: it does not own the internet."

Topics Google Privacy

Mashable Image
Cecily Mauran
Tech Reporter

Cecily is a tech reporter at Mashable who covers AI, Apple, and emerging tech trends. Before getting her master's degree at Columbia Journalism School, she spent several years working with startups and social impact businesses for Unreasonable Group and B Lab. Before that, she co-founded a startup consulting business for emerging entrepreneurial hubs in South America, Europe, and Asia. You can find her on X at @cecily_mauran.


Recommended For You
Netflix's 'Bullet Train Explosion' trailer is 'Speed' on a Japanese train
Passengers and a train conductor in 'Bullet Train Explosion.'


LinkedIn hit with lawsuit alleging private messages were used to train AI models
LinkedIn app on a smartphone screen

One company's devious plan to stop AI web scrapers from stealing your content
Cloudflare logo

Delete your data for good with a $30 Windows tool
Data Shredder Stick Secure Data Wiping Tool for Windows

Trending on Mashable
NYT Connections hints today: Clues, answers for April 9, 2025
Connections game on a smartphone

Dire wolves have been brought back from extinction. What does this mean?
Dire wolves Romulus and Remus next to each other in the snow at five months old.

Wordle today: Answer, hints for April 9, 2025
Wordle game on a smartphone

NYT Strands hints, answers for April 9
A game being played on a smartphone.

The biggest stories of the day delivered to your inbox.
These newsletters may contain advertising, deals, or affiliate links. By clicking Subscribe, you confirm you are 16+ and agree to our Terms of Use and Privacy Policy.
Thanks for signing up. See you at your inbox!