Data Normalisation and Standardisation | DataSupplier
DataSupplier
Insights EN · ES Log in Request a Quote
Insights / Delivery & Technical

Data normalisation and standardisation

DataSupplier·11 min read

Data from different sources speaks different dialects. Normalisation makes them speak one. This guide covers normalising and standardising external data.

Why normalisation matters

Combining sources requires consistent units, formats, schemas and value sets. Without normalisation, the same concept appears in incompatible ways and analysis breaks.

What it covers

  • Units: consistent measures.
  • Formats: dates, numbers, text.
  • Schema: mapping to a common model.
  • Values: standard code lists.

Schema mapping

Mapping each source to a common schema is the core task, and reference data anchors value standardisation. Done well, it makes heterogeneous sources interoperable.

Pitfalls

Lossy mapping (collapsing distinct values) and silent assumptions (about units or time zones) are common traps. Documenting the mapping is essential.

Sourcing considerations

Normalisation is where most multi-source value is unlocked, and where most errors hide. Clear target schema and reference data are prerequisites.

In a managed model

A managed partner can normalise sourced data to your target schema with documented mappings.

Schema mapping and reference data

Normalisation maps each source to a common model and standardises values against reference data, so the same concept is represented the same way everywhere. Schema mapping is the core task; reference data anchors value standardisation. Done well, heterogeneous sources become interoperable; done badly, the same concept appears in incompatible forms and analysis breaks.

Avoiding lossy mapping

The common traps are lossy mapping (collapsing distinct values into one) and silent assumptions about units or time zones. Documenting the mapping, and a clear target schema, prevents both. Normalisation is where most multi-source value is unlocked, and where most errors hide.

Key takeaways
  • Combining sources needs consistent units, formats, schema and values.
  • Schema mapping to a common model is the core task.
  • Reference data anchors value standardisation.
  • Avoid lossy mapping; document assumptions.

Sources & further reading

  • DAMA-DMBOK: data integration and standardisation.
  • Reference data and unit standards (ISO).
  • ISO/IEC 25012: data quality.
  • Internal practice: DataSupplier normalisation.
Need data normalised?

We normalise sourced data to your target schema with documented mappings. Get a no-obligation quote.

Request a Quote Book a 30-minute call
Related
Reference data and code lists →Data integration patterns: ETL, ELT and CDC →