Before you start cleaning, lock down your original data file and only make changes to a copy. This is important! This way, if you accidentally delete anything, you can always go back to the original. Gathering data takes a lot of time and effort. You don't want to have to redo it because you made a mistake during cleaning.
No matter what kind of data you have, or what cleaning tool you use, these basic steps will help you organize your process:
Almost every dataset needs some kind of cleaning, but most people don't realize that until their analysis goes wrong. You can save time and effort by doing some spot checks and exploratory visualizations to find mistakes before you start your analysis.
Scan your data for:
Exploratory visualizations are a great way to get to know your data better. Experiment with different kinds of visualizations. Exploratory visualizations don't have to look nice, but they can help you identify groupings, patterns, outliers, and any surprising values that might indicate mistakes that should be cleaned up.
Handy tools for exploring your data include:
The image below shows an example of a text analysis dashboard created by pasting plain text into Voyant Tools.