Data normalisation and standardisation
Data from different sources speaks different dialects. Normalisation makes them speak one. This guide covers normalising and standardising external data.
Why normalisation matters
Combining sources requires consistent units, formats, schemas and value sets. Without normalisation, the same concept appears in incompatible ways and analysis breaks.
What it covers
- Units: consistent measures.
- Formats: dates, numbers, text.
- Schema: mapping to a common model.
- Values: standard code lists.
Schema mapping
Mapping each source to a common schema is the core task, and reference data anchors value standardisation. Done well, it makes heterogeneous sources interoperable.
Pitfalls
Lossy mapping (collapsing distinct values) and silent assumptions (about units or time zones) are common traps. Documenting the mapping is essential.
Sourcing considerations
Normalisation is where most multi-source value is unlocked, and where most errors hide. Clear target schema and reference data are prerequisites.
In a managed model
A managed partner can normalise sourced data to your target schema with documented mappings.
Schema mapping and reference data
Normalisation maps each source to a common model and standardises values against reference data, so the same concept is represented the same way everywhere. Schema mapping is the core task; reference data anchors value standardisation. Done well, heterogeneous sources become interoperable; done badly, the same concept appears in incompatible forms and analysis breaks.
Avoiding lossy mapping
The common traps are lossy mapping (collapsing distinct values into one) and silent assumptions about units or time zones. Documenting the mapping, and a clear target schema, prevents both. Normalisation is where most multi-source value is unlocked, and where most errors hide.
- Combining sources needs consistent units, formats, schema and values.
- Schema mapping to a common model is the core task.
- Reference data anchors value standardisation.
- Avoid lossy mapping; document assumptions.
Sources & further reading
- DAMA-DMBOK: data integration and standardisation.
- Reference data and unit standards (ISO).
- ISO/IEC 25012: data quality.
- Internal practice: DataSupplier normalisation.
We normalise sourced data to your target schema with documented mappings. Get a no-obligation quote.