Master data management and entity resolution
When you combine data from many sources, the same company, place or person appears in different forms. Master data management and entity resolution reconcile them. This guide explains both and why they matter for external data.
The fragmentation problem
External datasets describe the same entities differently: a company named five ways, an address formatted three. Without reconciliation, combining sources produces duplicates and contradictions rather than insight.
What master data management is
Master data management (MDM) is the discipline of maintaining a single, trusted version of core entities, customers, products, suppliers, locations, across systems. It provides the reference against which incoming data is aligned.
Entity resolution
Entity resolution is the process of deciding whether two records refer to the same real-world entity, and merging them into a golden record. It combines deterministic rules (exact matches) and probabilistic matching (scored similarity) to handle messy, real data.
Why it matters for external data
Combining external sources is one of the highest-value and highest-risk activities in data sourcing. Good entity resolution is what makes a multi-source dataset coherent; poor matching silently corrupts it.
Challenges
Matching is hard: name variations, missing identifiers, and the risk of false matches (merging distinct entities) or missed matches (leaving duplicates). Thresholds trade precision against recall, and the right balance depends on the use.
In a managed model
A managed partner can resolve entities across sourced datasets and deliver a unified, deduplicated view, with documentation of the matching approach and its limitations.
Deterministic and probabilistic matching
Entity resolution blends two approaches. Deterministic matching joins on exact agreement of chosen keys, precise but brittle when data is messy or identifiers are missing. Probabilistic matching scores similarity across multiple fields and accepts matches above a threshold, handling real-world variation at the cost of tuning and some uncertainty. Most production resolution uses both: deterministic where strong identifiers exist, probabilistic to catch the rest, with the threshold set by the cost of false versus missed matches.
Governing the golden record
Merging matched records into a single golden record raises survivorship questions: which source wins for each field, how conflicts are resolved, and how the merge is documented so it can be audited and unwound. Good MDM keeps lineage to the contributing sources and is conservative about merges, because a wrong merge (two distinct entities combined) is harder to detect and more damaging than a missed one.
- Combining sources creates duplicate, conflicting entity records.
- MDM maintains a single trusted version of core entities.
- Entity resolution merges records into golden records via rules and probabilistic matching.
- Matching trades precision against recall; document the approach.
Sources & further reading
- DAMA-DMBOK: master data and reference data management.
- Academic and industry literature on entity resolution.
- ISO 8000: data quality and master data.
- Internal practice: DataSupplier matching.
We resolve entities across sources and deliver a unified, deduplicated view. Get a no-obligation quote.