Documentation means capturing the work that you do in a way that enables others to understand what you did so they can duplicate the process. To do this, your documentation must include information about what was done, how it was done, why it was done, when it was performed, where it was performed, and who performed the work.
A University of Wiscosin Data Service YouTube video on why data dictionaries are the ideal way to document spreadsheets and datasets with lots of variables.
Using Code Comments Effectively
Microsoft wrote a helpful guide to understanding when to use comments and how to use them to improve your code.
Data documentation should start at the beginning of a project and continue throughout your process. This will make documentation easier and make it less likely that you will forget the details of each process later. Data documentation will also ensure that you and others will be able to interpret, assess, and repeat your work.
Knowing what to include in your documentation depends on your project and the data types you may be generating.
Some possible elements to include:
Purpose of data collection
Data collection procedures
Structure and organization of the data files
Time and timing of data collection
Data validation and quality assurance
Types of manipulation conducted on raw data during analysis
Documentation can take on a variety of formats, though all formats should be similar in content. All forms of documentation must include basic information about the data that allow for its correct interpretation and reuse by yourself in the future and other researchers. Different fields of study may choose one format over another.
README file - a file that contains critical information about data file(s), including: citation information, file organization structure, variable definitions, methodological information, code (if applicable), data collection information, software/instruments used and versions, licensing information, etc. Often in .txt file format.
README tab - similar to a README file, except this is often created in connection with a spreadsheet.
Data Dictionary - a file that provides critical information about a data file by describing the names, definitions, and attributes of the data elements. This is often created for tabular data, though it can be made for all dataset formats.
Codebook - a file that documents the layout and structure of a data file and contains the response codes that are used to record survey responses and other information. This is most often done in social science research.
Commented Code - in-line comments in computer code that provide a description of the code’s function that are not obtainable from reading the code itself.
Lab Notebooks - a detailed record of all activities done while conducting research, including experimental materials and conditions, protocols, and results. Includes both e-lab notebooks and physical notebooks.
Below are a few examples of documentation types that may be useful for your work:
Readme Files
Data Dictionaries & Codebooks
Laboratory Records