Skip to Main Content

University Library, University of Illinois at Urbana-Champaign

Introduction to Data Management for Undergraduate Students: Data Documentation

This guide covers the basics and best practices for data management for individuals who are new to the research and data-collecting process.

What is Documentation?

Documentation means capturing the work that you do in a way that enables others to understand what you did so they can duplicate the process. To do this, your documentation must include information about what was done, how it was done, why it was done, when it was performed, where it was performed, and who performed the work.

Documentation Resources

Data Dictionaries

A University of Wiscosin Data Service YouTube video on why data dictionaries are the ideal way to document spreadsheets and datasets with lots of variables.

Using Code Comments Effectively

Microsoft wrote a helpful guide to understanding when to use comments and how to use them to improve your code.

Why is documenting your work important?

Data documentation should start at the beginning of a project and continue throughout your process. This will make documentation easier and make it less likely that you will forget the details of each process later. Data documentation will also ensure that you and others will be able to interpret, assess, and repeat your work.

 

Knowing what to include in your documentation depends on your project and the data types you may be generating.

Some possible elements to include:

 

  • Purpose of data collection

  • Data collection procedures

  • Structure and organization of the data files

  • Time and timing of data collection

  • Data validation and quality assurance

  • Types of manipulation conducted on raw data during analysis

  • Data confidentiality, access, and use conditions

Documentation Formats

Documentation can take on a variety of formats, though all formats should be similar in content. All forms of documentation must include basic information about the data that allow for its correct interpretation and reuse by yourself in the future and other researchers. Different fields of study may choose one format over another.

  • README file - a file that contains critical information about data file(s), including: citation information, file organization structure, variable definitions, methodological information, code (if applicable), data collection information, software/instruments used and versions, licensing information, etc. Often in .txt file format. 

  • README tab - similar to a README file, except this is often created in connection with a spreadsheet. 

  • Data Dictionary -  a file that provides critical information about a data file by describing the names, definitions, and attributes of the data elements. This is often created for tabular data, though it can be made for all dataset formats.

  • Codebook - a file that documents the layout and structure of a data file and contains the response codes that are used to record survey responses and other information. This is most often done in social science research.

  • Commented Code - in-line comments in computer code that provide a description of the code’s function that are not obtainable from reading the code itself.

  • Lab Notebooks - a detailed record of all activities done while conducting research, including experimental materials and conditions, protocols, and results. Includes both e-lab notebooks and physical notebooks.

Documentation Examples

Below are a few examples of documentation types that may be useful for your work:

 

Readme Files

Data Dictionaries & Codebooks 

Laboratory Records