LibGuides: Data Cleaning for the Non-Data Scientist: Overview

About this Guide

Data are the recorded facts or observations you use as evidence in your research. They can be anything from images to sensor readings to audio recordings to text - anything you use in your analysis. You might collect data yourself or find it online. No matter what kind of data you have, cleaning it will make your analysis easier and give you more confidence in your results.

Data cleaning refers to the process of preparing data for analysis, and often includes steps like normalizing values, handling blank values, re-organizing data, and otherwise refining data into exactly what you need.

This guide has helpful tips for anyone working with data, but is designed for the non-data scientist. It covers basic steps for data cleaning, how to spot messy data, data cleaning tools, and specific advice for cleaning text and spreadsheets.

Why clean data?

Data cleaning helps you:

Find and correct errors.
Understand your data better.
Make analysis quicker, easier, and more accurate.