Skip to main content

University Library, University of Illinois at Urbana-Champaign

OpenRefine

A free, open source, powerful tool for working with messy data.

Joining Projects

In OpenRefine it is possible to merge two of your projects, linking data that you have been working on separately or making additions to an existing data set. It is important to remember that this will only work with projects that are stored in your specific instance of OpenRefine and will not work across two different instances of OpenRefine.

What is a Key?

Before you can begin merging your data, it is important to be certain your data includes a “key.” Oftentimes, data will have a unique identifier which is in turn associated with a set of information. For example, you might have an ISBN that is linked to the title of a book, the author’s name, and the publisher. In order to merge two sets of data, it is important that there is some sort of unique identifier, or “key,” for each row of data so that when the projects are merged, the program can identify which rows “match.”

How to Join Two Projects

  1. Identify the two projects you would like to merge:
    1. One project to import data INTO
    2. One project to export data FROM
  2. In the project with the data to be exported, identify the unique key you will be using.

Identifying a key column

  1. In the project into which data is being imported, select the column matching the key and click on the arrow button in the column header.
  2. Choose “Edit column” and select “Add a column based on this column.”

Process for adding a column based on a chosen column

  1. In the pop-up window, give the new column a name and then enter this expression in the GREL expression box:

cell.cross('arg1','arg2').cells['arg3'].value[arg4]

  • arg1 = name of project you are exporting data from
  • arg2 = name of the key column
  • arg3 = name of the column you are importing
  • arg4 = indicate which value to import in the array (if multiple matches for the key) (recommended to use 0)

Google Refine Expression Language box for adding a column based on a column

  1. When the syntax is okay and you are satisfied with the preview, click “OK” and the new column will be added.

Example of a column added based on another column

Helpful Tips:

  • Copy down the name of the project you will be exporting data from ahead of time or have the project open in another window so that you don’t have to switch back and forth between projects.
  • Remember that regular expressions are CASE SENSITIVE. Nothing will happen if the column name and project name are not exact.

5/21/2018 - Brinna Michael