Skip to main content
The first step to depositing data is determining what data to deposit. Research projects often generate lots of data throughout the life of the project, and it is not always feasible to deposit all of the data from the project. When selecting data to deposit in a data repository like University of Nebraska-LincolnDR, you should consider
- the importance of the data
- the reusability of the data
- the necessity of the data to validating research results
In addition, you must address whether the data includes personally identifiable information and whether you have the rights to make the dataset public.
Selecting File Formats
A sustainable digital format is one that is compatible, for the foreseeable future, with software needed to open and read it. Unfortunately, as software applications change or disappear over time, data file formats can become obsolete. If you are using a proprietary and/or obscure file format, there is a risk of the format becoming obsolete--making your data unusable. Wherever possible, select data formats that have the following sustainability attributes:
|Adheres to specifications that are publicly documented versus formats based on proprietary specifications
||TIFF format for images
|Is in widespread use and readable with available software
||HTML for hypertext, CSV for tabular data
|Is self-describing, i.e., contains embedded metadata that help interpret the context and structure of the data file
||XML files contain headers and tags describing the file's content
|Contains as much of the original information as possible
||Motion JPEG 2000, a “lossless” format for digital video
If you are working in a proprietary/less-sustainable format, consider converting your data to an open, widely-used format when you preserve and share your data. Many software programs allow for saving/converting datasets into more open formats (e.g. save SPSS dataset as CSV). This will better ensure that your data is usable by others and into the future.
If you are uncertain of which file formats to select for long-term preservation of your research data, here are some tips to help you decide:
- Select formats that ensure the best chance for long-term access to data
- Favor commonly used and non-proprietary formats
- Consider longevity, popularity, and potential for migration
- Investigate detailed technical information about file formats using the UK National Archives' PRONOM Registry
- Consider the requirements of your selected data repository: If you intend to deposit your data in a data repository, this repository may have guidelines on how data should be structured and what file formats it will accept. Many institutions also provide file format recommendations and preferences based on content type:
File Naming Conventions
This is a set of conventions you define for naming data files and the folders you keep them in, and for saving multiple versions of files. Using naming/versioning conventions will:
- Prevent accidental overwrites or deletion
- Make it easier to locate specific data files
- Preserve differences in the information content or functionality of different file versions
- Prevent confusion if multiple people are working on shared files
Below are some general guidelines for naming files and folders. While it is recommended that these guidelines are followed, it is most important that you ensure that:
- conventions are defined and documented for your research project,
- all members of the research team are aware of these conventions, and
- conventions are followed consistently by all team members for the duration of the project.
General naming recommendations
- Define a naming convention and be consistent using it, especially if multiple people are sharing files
- Avoid "/ \ : * ? " < > [ ] & $ in names. These characters have specific meanings in your computer's operating system that could result in misreading or deleting these files
- Use underscores (_) not spaces to separate terms
- Keep names short, 15-20 characters or less
- Use names that describe the general category of files the folder contains
- Keep names short, 25 characters or less
- Use names that describe the contents of the file
- Include a date using the format recommended by ISO 8601: YYYY-MM-DD
- Do not include the folder name in the file name unless you are sharing files and there may be confusion about to which folder a file should be added
- Include a version number at the end of the file name, such as v01. Change this version number each time the file is saved.
- For the final version, substitute the word FINAL for the version number.
- Turn on versioning or tracking in collaborative works or storage spaces such as Wikis, Sharepoint, Google Docs, or MyWebSpace. Box@University of Nebraska-Lincoln includes automatic document versioning.
- Use a version control system such as Apache Subversion or Git to track versions of files, especially computer code.
Documentation provides information and context to aid in comprehending your data. It is not only useful (and likely necessary) for anyone else attempting to reuse your dataset, but also for you in the future. When in the midst of a project, it is easier to remember details that may be forgotten as time passes. By including adequate descriptive documentation, you aid in the long-term understandability of your dataset.
One common form of documentation is a README file. This is simply a text file that is included with your dataset (generally in the top level folder) and is intended to be read first. The Cornell Research Data Management Service Group provides an excellent guide on creating README files, and the University of Minnesota Libraries have created a README file template that can be downloaded and modified to fit your needs.
Metadata can be simply described at "data about data". It is the information about the context, content, quality, provenance, and/or accessibility of data. When you submit data to University of Nebraska-LincolnDR, you will be asked to provide some basic metadata describing the dataset that you are depositing. This will help others discover your dataset and help them understand what your dataset is all about.
The following metadata fields are provided when you deposit data in University of Nebraska-LincolnDR:
- Required Fields
- Title: Name given to the dataset, e.g. Double-Slit Mask Movement
- Creator(s): Person/people responsible for creating the dataset e.g. Smith, John
- Organization: Person or organization associated with project or grant, e.g. Institute of Agriculture and Natural Resources
- Description: Description of the data, methodology, or the study in which the data were generated--be detailed!
- Data Types: The nature or genre of the data, e.g. collection, dataset, image, software, text
- Data Format: The file format, physical medium, or dimensions of the data, e.g. TXT, PDF, CSV, TIFF
- Keywords: Keywords related to the data, e.g. electron, imaging, isolation
- Recommended Fields
- Collection: Name of collection to which this dataset belongs, if applicable.
- Alternative Title: Alternative name or subtitle for dataset.
- Contributor(s): Researchers or others who contribute to data creation/collection, e.g. Thorson, James.
- Identifier: A unique reference to the data, like a URL, DOI, or Grant Number, e.g. http://dx.doi.org/10.1108/07378831211213238.
- ORCID ID: A persistent digital identifier for researchers, e.g. http://orcid.org/0000-0002-4671-061X. Visit the ORCID website for more information.
- Sponsor: Agency funding the research, e.g. NSF, IMLS
- IRB Information: If IRB process was required for the research resulting in this dataset, include IRB number.
- Data Source: Primary source from which the data are derived, e.g. 2010 census
- Language: The language of the data and/or documentation, e.g. English
- Version: The version or the edition of the data, e.g. Version 2
- Other: Any other data you think is relevant to the submission
- Date Fields
- Date Submitted: Date submitted to University of Nebraska-Lincoln Data Repository.
- Date Created: Date date was generated/collected.
- Date Valid: Date after which data will no longer be valid.
- Date Available: Date dataset can be made publicly available. Default is immediately.
- Date Modified: Most recent modification of dataset.
Note: Some fields may not be applicable to your particular dataset.
For more information on metadata, see the Data Management Guide.
318 Love Library
P.O. Box 884100
University of Nebraska-Lincoln
Lincoln, NE 68588-4100
(This address is located at 13th & R Streets in Lincoln, Nebraska.)