Insights / Compliance & Governance

The EU AI Act for data and model providers

DataSupplier·15 min read

The EU AI Act is the first comprehensive law for artificial intelligence, and it places real obligations on the data behind models. This guide explains what it means for organisations that source training data and build or deploy AI.

Available across the EU. DataSupplier sources and delivers this data in all 27 European Union countries — including Germany, France, Spain, Italy, the Netherlands and Poland — and across the EEA, in the format and cadence you need.

What the AI Act is

The EU AI Act (Regulation (EU) 2024/1689) regulates AI systems by risk, from prohibited practices through high-risk systems to limited and minimal-risk uses. It is being phased in over several years, with obligations arriving at different dates.

Why data is central

For high-risk systems, the Act sets data-governance requirements: training, validation and testing data must be relevant, representative, and to the extent possible free of errors and complete, with attention to bias. How you source and prepare data is now a compliance matter, not just an engineering one.

The risk tiers

Prohibited: certain unacceptable uses.
High-risk: systems in sensitive domains, with the heaviest obligations.
Limited-risk: transparency duties.
Minimal-risk: largely unregulated.

Documentation and provenance

High-risk systems require technical documentation, including about the data used. Provenance, licensing and a record of data preparation become part of the compliance file, reinforcing the value of well-documented sourcing.

Interaction with the GDPR

The AI Act sits alongside the GDPR. Personal data in training sets remains subject to data-protection law, and the two regimes must be satisfied together.

What it means for sourcing

Source training data with representativeness, quality and rights in mind; document provenance and preparation; and treat data governance as part of AI compliance from the outset. This is general information, not legal advice.

The data-governance obligations for high-risk AI

For high-risk systems, the Act sets explicit data-governance duties: training, validation and testing data must be relevant and sufficiently representative, examined for bias, and, to the extent possible, free of errors and complete for the intended purpose. There are also record-keeping and technical-documentation requirements that cover the data used. In effect, how you source and prepare data becomes auditable evidence of compliance, not just an engineering detail.

Timelines and interaction with the GDPR

The AI Act is phased in over several years, with prohibited-practice rules first and obligations for high-risk systems and general-purpose models following on staggered dates. Throughout, it operates alongside the GDPR: where training data contains personal data, both regimes must be satisfied together. Documented provenance and lawful basis for training data sit at the intersection of the two.

Key takeaways

The EU AI Act (Regulation (EU) 2024/1689) regulates AI by risk tier.
High-risk systems carry data-governance duties: relevance, representativeness, quality.
Provenance and data documentation become part of the compliance file.
The AI Act and GDPR must be satisfied together.

Sources & further reading

EUR-Lex: Regulation (EU) 2024/1689 (AI Act).
European Commission: AI Act overview and timelines.
EUR-Lex: Regulation (EU) 2016/679 (GDPR).
OECD: AI principles.

Sourcing data for AI under the AI Act?

We source representative training data with documented provenance and preparation. Get a no-obligation quote.

Request a Quote Book a 30-minute call

Related

Data for AI and ML training: sourcing, rights and augmentation →GDPR for external data: lawful bases, roles and transfers →