The first step to depositing data is determining what data to deposit. Research projects often generate lots of data throughout the life of the project, and it is not always feasible to deposit all of the data from the project. When selecting data to deposit in a data repository like University of Nebraska-LincolnDR, you should consider
In addition, you must address whether the data includes personally identifiable information and whether you have the rights to make the dataset public.
A sustainable digital format is one that is compatible, for the foreseeable future, with software needed to open and read it. Unfortunately, as software applications change or disappear over time, data file formats can become obsolete. If you are using a proprietary and/or obscure file format, there is a risk of the format becoming obsolete--making your data unusable. Wherever possible, select data formats that have the following sustainability attributes:
Sustainability attribute | Example |
---|---|
Adheres to specifications that are publicly documented versus formats based on proprietary specifications | TIFF format for images |
Is in widespread use and readable with available software | HTML for hypertext, CSV or TSV for tabular data |
Is self-describing, i.e., contains embedded metadata that help interpret the context and structure of the data file | XML files contain headers and tags describing the file's content |
Contains as much of the original information as possible | Motion JPEG 2000, a “lossless” format for digital video |
If you are working in a proprietary/less-sustainable format, consider converting your data to an open, widely-used format when you preserve and share your data. Many software programs allow for saving/converting datasets into more open formats (e.g., saving an SPSS dataset as CSV). This will better ensure that your data are usable by others, now and into the future.
If you are uncertain of which file formats to select for long-term preservation of your research data, here are some tips to help you decide:
This is a set of conventions you define for naming data files and the folders you keep them in, and for saving multiple versions of files. Using naming/versioning conventions will:
Below are some general guidelines for naming files and folders. While it is recommended that these guidelines are followed, it is most important that you ensure that:
Documentation provides information and context to aid in comprehending your data. It is not only useful (and likely necessary) for anyone else attempting to reuse your dataset, but also for you in the future. When in the midst of a project, it is easier to remember details that may be forgotten as time passes. By including adequate descriptive documentation, you aid in the long-term understandability of your dataset.
One common form of documentation is a README file. This is simply a text file that is included with your dataset (generally in the top level folder) and is intended to be read first. The Cornell Research Data Management Service Group provides an excellent guide on creating README files, and the University of Minnesota Libraries have created a README file template that can be downloaded and modified to fit your needs.
Metadata can be simply described at "data about data". It is the information about the context, content, quality, provenance, and/or accessibility of data. When you submit data to University of Nebraska-LincolnDR, you will be asked to provide some basic metadata describing the dataset that you are depositing. This will help others discover your dataset and help them understand what your dataset is all about.
The following metadata fields are provided when you deposit data in University of Nebraska-LincolnDR:
Note: Some fields may not be applicable to your particular dataset.
For more information on metadata, see the Data Management Guide.