Avoid confusion when consolidating multi-user data matchings


Handling data matching requires the right tool to avoid getting confused . The solution therefor was to store all elements which belong together in a special group called "clique". The clique avoids inconsistent matchings and confusing result.

One caveat of letting one human user match data is that the result might be error prone. Maybe overlooking things, lack of concentration. Nobody is perfect. This is a real risk even when the tool works flawlessly. To diminish this risk, redundancy will help.

It is rather unlikely that all do the same mistake. Consequently, more users creating matchings will increase the result quality. The challenge is to consolidate the individual matchings to a joint result list.

Situation

This example is based on the previous one about matching presidents. In contrast to the previous example, imagine that there is a team doing the data matching, namely Alice, Bob and Clara. This is an extract of what they created:
One of Alice's matching cliques
List 3 : Adams, John
List 1 : John Adams
One of Bob's matching cliques
List 2 : J. Adams
List 1 : John Adams
...
List 3 : Adams, John
One of Clara's matching cliques
List 1 : John Adams
List 2 : J. Adams
This is almost as bad as in the proverb "2 lawyers, 3 opinions": Alice missed one matching, Bob made an incorrect match and Clara forgot another matching.

One thing gets clear: Consolidating multiple user matchings is not a trivial endeavour.

Consolidating user matchings

It is reasonable to assume that an algorithm would consolidate "John Adams", "J. Adams" and "Adams, John" based on the users input. In Alice's and Clara's matchings are pieces of evidence missing, but there is nothing standing against it.

However, the question is whether the gray "..." element would end up in the consolidated "John Adams" clique as well, as Bob suggests. The answer is: It depends. If there are matchings from Alice or Clara containing the gray "..." element within another clique, then it does not show up with "John Adams" most likely. If there is no such matching, then this gray "..." element will likely be matched to the "John Adams" clique. So both results may emerge.
Consolidated clique alternative 1
List 2 : J. Adams
List 1 : John Adams
List 3 : Adams, John
Consolidated clique alternative 2
List 2 : J. Adams
List 1 : John Adams
...
List 3 : Adams, John
Is this a bug of the concept? Definitely not. But it is something which might happen during all kinds of data matching.

Proof reading the user matchings for increasing consolidation quality

Before consolidating a common result, checking the users' matchings can be of value. Imagine that another user Dave can eliminate incorrect matchings before these ruin the consolidation. Instead of letting Dave check all cliques manually, he could also look at the matched pairs. And even better: He can now see how much support a certain pair has. The support is the number of users who created the matching.
Table for Dave to judge the pairwise matching correctness
Matching Element A Matching Element B Support Dave's Judgement
...
...
.. ..
List 1 : John Adams
List 2 : J. Adams
2
List 1 : John Adams
List 3 : Adams, John
2
List 2 : J. Adams
List 3 : Adams, John
2
List 1 : John Adams
...
1
List 2 : J. Adams
...
1
List 3 : Adams, John
...
1
...
...
.. ..
With an easy voting mechanism Dave rules out what was incorrect in his opinion. Consequently, the result of the consolidation process is much better informed now.
Consolidated clique taking Dave's judging into account
List 2 : J. Adams
List 1 : John Adams
List 3 : Adams, John

Matchmerize can integrate a proof reading step before multi-user matchings are consolidated

One of the strengths of Matchmerize is to allow multiple users work on the same matching task. Consequently, Matchmerize is able to consolidate the inputs into one joint matching result list.

And of course, the judgement step - as explained above - is available, too. This makes Matchmerize unique as there is no other approach able to provide these features. Automation tools may consolidate result list based on text-similarity measures. But this works only for simple problems where no human intelligence and no context-based similarity is required anyway.
Learn more about the shortcomings of text-based similarity when matching data