One of OpenRefine’s most powerful features is the “Clustering” function. With the support of several types of key collision and nearest neighbor algorithms, the Clustering function can help you to identify inconsistencies in your data from misspellings, to non-standardized value formatting, or input error.
Clustering works by using what is called “fuzzy matching” on the values within a chosen column using the algorithm of your choice to determine if possible cell values “look similar” enough to be possible matches. The algorithms supported by OpenRefine are of two types:
For more information on the specific types of algorithms you can choose from, see the OpenRefine documentation on Clustering In Depth.
Helpful Tips:
5/21/2018 - Brinna Michael