OpenAI is being sued for training ChatGPT with 'stolen' personal data

The law firm is accusing OpenAI of using your data without consent.
By
Cecily Mauran
 on 
ChatGPT logo on a smartphone in front of a projection of the OpenAI logo
A California law firm says ChatGPT was built with 'stolen' data. Credit: Getty Images

A California law firm has filed a class-action lawsuit against OpenAI for "stealing" personal data to train ChatGPT.

Clarkson Law Firm, in a complaint filed in the Northern District of California court on Wednesday, alleges ChatGPT and Dall-E "use stolen private information, including personally identifiable information, from hundreds of millions of internet users, including children of all ages, without their informed consent or knowledge." To train its large language model, OpenAI scraped 300 billion words from the internet, including personal information and posts from social media sites like Twitter and Reddit. The law firm claims OpenAI "did so in secret, and without registering as a data broker as it was required to do under applicable law."

OpenAI has been the subject of controversy for how and what data it collects to train and further develop ChatGPT. Until recently, there was no explicit way for users to opt out of letting OpenAI use their conversations and personal information to feed the model. ChatGPT was initially banned in Italy, using Europe's General Data Protection Regulation (GDPR), for inadequately protecting user data, especially when it comes to minors. This lawsuit includes OpenAI's opaque privacy policies for existing users, but largely focuses on data scraped from the web that was never explicitly intended to be shared with ChatGPT. Through billion-dollar investments from Microsoft and subscriber revenue for ChatGPT Plus, OpenAI has profited from this data without compensating its source.

Mashable Light Speed
Want more out-of-this world tech, space and science stories?
Sign up for Mashable's weekly Light Speed newsletter.
By clicking Sign Me Up, you confirm you are 16+ and agree to our Terms of Use and Privacy Policy.
Thanks for signing up!

The 15 counts in the complaint include violation of privacy, negligence for failing to protect personal data, and larceny by illegally obtaining massive amounts of personal data to train its models. Datasets like Common Crawl, Wikipedia, and Reddit, which include personal information, are publicly available as long as companies follow the protocols for purchase and use of this data. But OpenAI allegedly used this data without permission or consent of users in the context of ChatGPT. Even though people's personal information is public on social media sites, blogs, and articles, if data is used outside of the intended platform, it can be considered a violation of privacy.

In Europe, there's a legal distinction between public domain and free-to-use data thanks to the GDPR law, but in the US, that's still up for debate. Nader Henein, a privacy research VP at Gartner who thinks the sentiment of the lawsuit is valid, said, "People should have control as to how their data is used, even when it is available in the public domain." But Henein is unsure if the US legal system would agree.

Ryan Clarkson, managing partner said in the firm's blog post, it's critical to act now with existing laws instead of waiting for Executive and Judicial branches to respond with federal regulation. "We cannot afford to pay the cost of negative outcomes with AI like we’ve done with social media, or like we did with nuclear. As a society, the price we would all pay is far too steep."

Topics Privacy ChatGPT

Mashable Image
Cecily Mauran
Tech Reporter

Cecily is a tech reporter at Mashable who covers AI, Apple, and emerging tech trends. Before getting her master's degree at Columbia Journalism School, she spent several years working with startups and social impact businesses for Unreasonable Group and B Lab. Before that, she co-founded a startup consulting business for emerging entrepreneurial hubs in South America, Europe, and Asia. You can find her on X at @cecily_mauran.


Recommended For You
OpenAI, Microsoft, Trump admin claim DeepSeek trained AI off stolen data
DeepSeek and OpenAI logos


This is how long (and why) OpenAI's Operator holds onto your deleted data
OpenAI’s Operator on a website

OpenAI responds to criticism of ChatGPT's Studio Ghibli-style images
studio ghibli exhibition in singapore showing sculptures of 'my neighbor totoro'

OpenAI announces new ChatGPT product amid DeepSeek AI news
OpenAI logo behind the DeepSeek logo on a smartphone

Trending on Mashable
NYT Connections hints today: Clues, answers for April 11, 2025
Connections game on a smartphone

Wordle today: Answer, hints for April 11, 2025
Wordle game on a smartphone

NYT Strands hints, answers for April 11
A game being played on a smartphone.

NYT Mini crossword answers, hints for April 11, 2025
Close-up view of crossword puzzle.

'Black Mirror' fans, be warned: DO NOT start with 'Common People'
Chris O'Dowd and Rashida Jones star in "Black Mirror: Common People."
The biggest stories of the day delivered to your inbox.
These newsletters may contain advertising, deals, or affiliate links. By clicking Subscribe, you confirm you are 16+ and agree to our Terms of Use and Privacy Policy.
Thanks for signing up. See you at your inbox!