The U.S. Department of Health and Human Services Office of Research Integrity defines data as "any information or observations that are associated with a particular project, including experimental specimens, technologies, and products related to the inquiry". Data can be both physical and digital, and can come in many different forms. Some examples are
Part of the challenge of data management stems from the variety of data. The heterogeneity of data makes data management planning unique for every research project.
This is a set of conventions you define for naming data files and the folders you keep them in, and for saving multiple versions of files. Using naming/versioning conventions will:
Below are some general guidelines for naming files and folders. While it is recommended that these guidelines are followed, it is most important that you ensure that:
A sustainable digital format is one that is compatible, for the foreseeable future, with software needed to open and read it. Unfortunately, as software applications change or disappear over time, data file formats can become obsolete. If you are using a proprietary and/or obscure file format, there is a risk of the format becoming obsolete--making your data unusable. Wherever possible, select data formats that have the following sustainability attributes:
Sustainability attribute | Example |
---|---|
Adheres to specifications that are publicly documented versus formats based on proprietary specifications | TIFF format for images |
Is in widespread use and readable with available software | HTML for hypertext, CSV or TSV for tabular data |
Is self-describing, i.e., contains embedded metadata that help interpret the context and structure of the data file | XML files contain headers and tags describing the file's content |
Contains as much of the original information as possible | Motion JPEG 2000, a “lossless” format for digital video |
If you are working in a proprietary/less-sustainable format, consider converting your data to an open, widely-used format when you preserve and share your data. Many software programs allow for saving/converting datasets into more open formats (e.g., saving an SPSS dataset as CSV). This will better ensure that your data are usable by others, now and into the future.
If you are uncertain of which file formats to select for long-term preservation of your research data, here are some tips to help you decide: