Insights / Compliance & Governance

Anonymisation vs pseudonymisation vs aggregation, explained

DataSupplier·15 min read

These three terms are often used interchangeably, but under the GDPR they mean very different things, with very different consequences for how data can be used. Getting the distinction right is what makes a dataset safe to share, analyse or train a model on. This guide explains each, the risks, and how to choose.

Available across the EU. DataSupplier sources and delivers this data in all 27 European Union countries — including Germany, France, Spain, Italy, the Netherlands and Poland — and across the EEA, in the format and cadence you need.

Why the distinction matters

Whether data is anonymised, pseudonymised or merely aggregated determines whether the GDPR still applies, what you can do with it, and how much re-identification risk you carry. Treating pseudonymised data as if it were anonymous is one of the most common, and most costly, mistakes in data projects.

Anonymisation

Anonymisation irreversibly prevents identification of individuals. Done properly, anonymised data falls outside the scope of the GDPR, because it is no longer personal data. The bar, however, is high: data is only anonymous if individuals cannot be re-identified by any means reasonably likely to be used, accounting for other available datasets. Weak anonymisation that can be reversed, or defeated by linking to other data, does not qualify.

Pseudonymisation

Pseudonymisation replaces identifiers with tokens, so data can no longer be attributed to a person without additional information held separately. It is a valuable safeguard, explicitly encouraged by the GDPR, but pseudonymised data remains personal data and stays within the Regulation's scope. It reduces risk; it does not remove obligations.

Aggregation

Aggregation combines records into group-level statistics, such as totals or averages by area or segment. Aggregated outputs can be anonymous if groups are large enough and the design prevents inference about individuals, but small cells or unusual combinations can still leak identity. Aggregation is powerful for analytics and reporting, but it must be designed, not assumed.

The re-identification risk

The central risk across all three is re-identification through linkage: combining a "safe" dataset with other data to single out individuals. Quasi-identifiers such as postcode, date of birth and sex can, together, identify a surprising share of a population. Techniques such as k-anonymity, l-diversity and differential privacy exist to manage this, each with trade-offs between privacy and utility.

Synthetic data as an alternative

Where even well-treated real data is too risky or unavailable, synthetic data, generated to mirror the structure and statistics of the original without containing real records, can let development, testing and some analytics proceed. It is increasingly used alongside anonymisation rather than instead of it.

How to choose

Start from the use case and the risk appetite. If the goal is to remove GDPR obligations entirely and the use can tolerate some loss of detail, robust anonymisation or aggregation may fit. If you need record-level data for a controlled purpose, pseudonymisation with strong safeguards is often the answer. Whatever the choice, document the technique, the residual risk and the reasoning, because evidencing the decision is part of compliance.

Key takeaways

Anonymised data is outside the GDPR; pseudonymised data is not.
Anonymisation must resist re-identification by any reasonably likely means.
Aggregation can be anonymous, but only if designed to prevent inference.
Document the technique, residual risk and reasoning.

This article is general information, not legal advice. Confirm obligations for your situation with qualified counsel.

Sources & further reading

EUR-Lex: Regulation (EU) 2016/679 (GDPR), including recitals on anonymisation and pseudonymisation.
Article 29 Working Party: Opinion 05/2014 on Anonymisation Techniques.
European Data Protection Board (EDPB): guidance on pseudonymisation.
ENISA: reports on data pseudonymisation and re-identification risk.

Need data made safe to use?

We anonymise, pseudonymise, aggregate or generate synthetic data according to your use case, with documented technique and residual-risk reasoning. Get a no-obligation quote.

Request a Quote Book a 30-minute call

Related

GDPR for external data: lawful bases, roles and transfers → Synthetic data: start development before production data is ready →