Skip to Main Content

University Library, University of Illinois at Urbana-Champaign

Data Cleaning for the Non-Data Scientist

Considering how to clean data up when it's not part of your regular workflow.

About this Guide

Data are the recorded facts or observations you use as evidence in your research. They can be anything from images to sensor readings to audio recordings to text - anything you use in your analysis. You might collect data yourself or find it online. No matter what kind of data you have, cleaning it will make your analysis easier and give you more confidence in your results.

Data cleaning refers to the process of preparing data for analysis, and often includes steps like normalizing values, handling blank values, re-organizing data, and otherwise refining data into exactly what you need.

This guide has helpful tips for anyone working with data, but is designed for the non-data scientist. It covers basic steps for data cleaning, how to spot messy data, data cleaning tools, and specific advice for cleaning text and spreadsheets. 

Why clean data?

Data cleaning helps you:

  • Find and correct errors.
  • Understand your data better.
  • Make analysis quicker, easier, and more accurate.