Skip to main content

University Library, University of Illinois at Urbana-Champaign

OpenRefine

A free, open source, powerful tool for working with messy data.

Layout

Once you have imported your data, it is important to familiarize yourself with OpenRefine’s layout.

  1. In the top right corner there are three buttons:
    1. “Open…” returns you to the home screen where you can select projects.
    2. “Export” opens a dropdown menu of options to export your data.
    3. “Help” opens the OpenRefine User Documentation in a new tab in your browser.

Open, Export, and Help buttons on the main project view

  1. Below the bolded header stating how many rows/records there are two options:
    1. “Show as” allows you to change the grid view between rows and records. For more information on the difference between rows and records, see the explanation of Records and Rows below.
    2. “Show” allows you to change the number of rows/records visible in the grid view.

Options for changing grid view from rows to records and how many rows to view at once

  1. In the center of the page is your data in the grid view, which looks similar to Excel. Features of the grid view include:
    1. Column headings with dropdown arrows for chosing functions
    2. Row/Record numbers and alternate row/record shading
    3. Selectable flags and stars

Layout of the grid view with column headers and data

  1. On the left, there is a pane with two tabs:
    1. “Facet/Filter” allows you to work on selected sections of your data, including faceting, clustering, and filtering.
    2. “Undo/Redo” tracks and stores your history, allows you to undo or redo transformations, and export a JSON file of your transformations.

The pane in the main project window showing the facet/filter and undo/redo tabs

Records and Rows

There are two settings for the grid view in OpenRefine: rows or records.

The difference between rows and records is that “rows” display your data in individual lines, each numbered separately, while “records” display your data in multi-line groupings depending on the relationships between the data in those lines. For example:

The difference between the rows and records view options

This data has been transformed using “split multi-valued cells” on the author field to separate different authors into their own lines. On the left, the data is displayed as “records,” showing the different lines with the multiple authors grouped together. On the right, the data is displayed as “rows,” showing each of the multiple authors as a separate line.

NOTE: Take caution when permanently renumbering rows or records and be aware of what setting you are viewing your data under.

5/1//2018 - Brinna Michael