Sorry, your 'anonymized' data probably isn't anonymous

Researchers show how anonymized data sets are often anything but anonymous, releasing a tool that demonstrates how easy it is to pick you out of a digital crowd.
By
Jack Morse
 on 
Sorry, your 'anonymized' data probably isn't anonymous
They know who you are. Credit: liuzishan / getty

Anonymized data sets are a joke. And, as a newly published study shows, the joke just so happens to be on you.

From your credit card purchases to your medical records to your online browsing history, companies are sharing and selling so-called de-identified data sets containing a record of your every move. The information is supposedly stripped of any specific details — like your name — that would tie it directly back to you. However, it just so happens that true anonymization of your personal data is a lot more difficult than you might think.

So finds a study published today in the journal Nature Communications. Researchers determined that, using their model, "99.98% of Americans would be correctly re-identified in any dataset using 15 demographic attributes."

While 15 demographic attributes may sound like a lot of data to have on one person, the study puts this number into perspective.

"Modern datasets contain a large number of points per individuals," write the authors. "For instance, the data broker Experian sold [data science and analytics company] Alteryx access to a de-identified dataset containing 248 attributes per household for 120M Americans."

Mashable Light Speed
Want more out-of-this world tech, space and science stories?
Sign up for Mashable's weekly Light Speed newsletter.
By clicking Sign Me Up, you confirm you are 16+ and agree to our Terms of Use and Privacy Policy.
Thanks for signing up!

That anonymized data sets can be de-anonymized isn't itself news. In 2018, researchers at the DEF CON hacking conference demonstrated how they were able to legally and freely acquire the apparently anonymous browsing history of 3 million Germans and then quickly de-anonymize portions of it. The researchers were able to uncover, for example, the porn habits of a specific German judge.

Which, ouch.

This new study demonstrates just how little data is actually needed to pinpoint specific people from otherwise sparse data sets. "[Few] attributes are often sufficient to re-identify with high confidence individuals in heavily incomplete datasets," the authors note.

To drive that point home, Verdict reports that the researchers released an online tool that lets you see just how easy it would be to identify you in a supposedly anonymized data set.

Spoiler: The results are as troubling as you'd expect — something to keep in mind the next time a company's fine print warns that it "might share your anonymous data with third parties."

Mashable Image
Jack Morse

Professionally paranoid. Covering privacy, security, and all things cryptocurrency and blockchain from San Francisco.


Recommended For You
Delete your data for good with a $30 Windows tool
Data Shredder Stick Secure Data Wiping Tool for Windows

'Sorry, Baby' review: Eva Victor wins Sundance with brilliantly awkward comedy
Eva Victor writes, directs, and stars in "Sorry, Baby."


OpenAI, Microsoft, Trump admin claim DeepSeek trained AI off stolen data
DeepSeek and OpenAI logos

Discord user data leak resurfaces on hacker forum as third-party service disputes breach
Discord logo

More in Tech

A tariff survival guide to buying refurbished smartphones, TVs, and headphones
Blue tinted Asus laptop, Apple AirPods Max headphones, and Amazon Kindle arranged on blue and green backdrop

LinkedIn and Adobe announce partnership to help creators protect their work against AI
A phone showing the LinkedIn app in the app store.

The Anker Solix C1000 portable power station is better than half price
Anker Solix C1000 portable power station on gradient background

Save over $900 on the Ecovacs Deebot X1 Omni robot vacuum
Ecovacs Deebot X1 Omni robot vacuum on gradient background

Trending on Mashable
NYT Connections hints today: Clues, answers for April 27, 2025
Connections game on a smartphone

Wordle today: Answer, hints for April 27, 2025
Wordle game on a smartphone

NYT Strands hints, answers for April 27
A game being played on a smartphone.

The new M4 MacBook Air is down to its lowest-ever price on Amazon
Apple MacBook Air on gradient background

NYT Connections hints today: Clues, answers for April 26, 2025
Connections game on a smartphone
The biggest stories of the day delivered to your inbox.
These newsletters may contain advertising, deals, or affiliate links. By clicking Subscribe, you confirm you are 16+ and agree to our Terms of Use and Privacy Policy.
Thanks for signing up. See you at your inbox!