How to Achieve Accurate Identity Matching without the Sweat Equity
If you’ve ever renovated a house or bought a “fixer-upper,” you know all about sweat equity—it’s the painstaking investment of time and labor that goes into the project. What you might not know is when you buy a modern identity matching solution—whether a Master Data Management (MDM) or Master Patient Index (MPI) technology—you are signing up for more sweat equity than you realize.
I’m not talking about the time and money to implement and configure the software itself—you no doubt accounted for that already. I’m talking about the additional time and money you haven’t accounted for in order to realize the promised benefit of highly accurate matched and deduplicated records.
What you probably don’t realize is that MDM and MPI solutions only automatically match approximately 70% of records that could be matched. The other 30% fall into a queue of potential matches or “suspect duplicates”—these are records that are not similar enough to be automatically matched, but not different enough to be automatically considered non-matches.
This leaves you with three options:
- Tweak your matching tool’s settings so that more than 70% of matches are automatically made. BUT, these looser settings will generate false positives—matches that are made despite the two records referring to different people.
- Ignore the 30% of matches that should be made and treat them all as non-matches. BUT, this excessive number of “false negatives” leaves your systems riddled with duplicates.
- Commit to sweat equity. Muster together a team of data stewards or health information management (HIM) staff—a team that will go through the queue of “suspected duplicates” one by one, potentially taking years and costing many hundreds of thousands of dollars in operational expenses. For example, faced with 500,000 matchable record pairs, a typical MDM implementation will automatically find 350,000—leaving the other 150,000 in a queue of suspected duplicates that will take 4 full time employees 2 years to resolve.
So while many MDM and MPI solutions claim much higher match accuracy rates, their dirty little secret is that these higher rates require you to be ok with (1) having a lot of false positives and incorrectly merging together records for different people, (2) having a lot of false negatives and having excessively high duplicate rates, often 20% or higher, or (3) getting nice and sweaty for a year or more.
Why do current matching solutions require so much sweat equity?
Current state-of-the-art matching solutions use deterministic or probabilistic algorithms to match two patient or customer records together. These algorithms match two records by comparing them directly to each other. If the demographic data between the two records is close, then a match is made. This demographic data includes attributes like name, address, birthdate, and social security number.
But demographic data is always changing and is notoriously rife with errors. In fact, 30-40% of demographic data in any given database is out-of-date, incorrect, or incomplete. Names change as people get married; addresses and phone numbers change as people move; even social security numbers change when people recover from identity theft. And manual data entry often causes identity data to be incomplete and to contain missing letters, inverted names, and transposed numbers.
This means that the demographic data in two records that belong to the same person is often very different. In one record, the address might be old; in another, the last name might be misspelled and the SSN might be missing. Both records refer to the same person, but the task of matching those records—automatically and with a high degree of confidence—just got a lot harder for an algorithm.
A better way to match that doesn’t require the sweat equity
Thankfully, there is a better way to match records. Verato uses a powerful new technology called “Referential Matching” that is so accurate that it automatically finds and resolves 98% of duplicates without any sweat equity—including automatically resolving the “suspected duplicates” that MDM and MPI technologies flag for manual remediation by HIM staff or data stewards.
Rather than directly comparing the demographic data of two records to see if they match, Verato instead compares the demographic data from those records to Verato’s comprehensive and continuously-updated reference database of identities. This database contains over 300 million identities spanning the entire U.S. population, and each identity contains a complete profile of demographic data spanning a 30-year history—including nicknames, aliases, maiden names, common typos, past phone numbers, and old addresses. This reference database is essentially a pre-built “answer key” for patient demographic data.
By matching patient records to identities in this reference database, Verato can make matches that conventional matching technologies could never make—even if patient records contain errored, out-of-date, incomplete, or inconsistent demographic data.
Traditional matching solutions may quote impressive statistics about match rates and numbers of false positives and false negatives, but these statistics hinge on the assumption that you will invest the time and money to manually review a large portion of your data. On the other hand, Verato Referential Matching automatically finds 98% of duplicate records within a dataset, and can even automatically resolve the toughest matches that MDM and MPI technologies flag as “suspect duplicates” requiring manual review.