Skip to Main Content

Depositing Research Data with SANDY, NU's Data Repository: Preparing Data for Deposit

This guide provides information about preparing datasets for deposit with Stewarding Nebraska Data (SANDY).

Selecting Data

The first step in data sharing is determining what data to deposit. It is not always feasible or necessary to deposit data generated during a research project. When selecting data to deposit in a data repository, you should consider

  • the importance of the data
  • the re-usability of the data, and
  • the necessity of sharing the data to validate research results.

In addition, you must address whether the data include personally identifiable or other sensitive information and whether you have the rights to make the dataset public.

Selecting File Formats

A sustainable digital format is one that is compatible, for the foreseeable future, with software needed to open and read it. Unfortunately, as software applications change or disappear over time, data file formats can become obsolete. If you are using a proprietary and/or obscure file format, there is a risk of the format becoming obsolete--making your data unusable. Wherever possible, select data formats that have the following sustainability attributes:

Sustainability attribute Example
Adheres to specifications that are publicly documented versus formats based on proprietary specifications TIFF  format for images
Is in widespread use and readable with available software HTML for hypertext, CSV or TSV for tabular data
Is self-describing, i.e., contains embedded metadata that help interpret the context and structure of the data file XML files contain headers and tags describing the file's content
Contains as much of the original information as possible Motion JPEG 2000, a “lossless” format for digital video

If you are working in a proprietary/less-sustainable format, consider converting your data to an open, widely-used format when you preserve and share your data. Many software programs allow for saving/converting datasets into more open formats (e.g., saving an SPSS dataset as CSV). This will better ensure that your data are usable by others, now and into the future.

If you are uncertain of which file formats to select for long-term preservation of your research data, here are some tips to help you decide:

  • Select formats that ensure the best chance for long-term access to data
  • Favor commonly used and non-proprietary formats
  • Consider longevity, popularity, and potential for migration
  • Investigate detailed technical information about file formats using the UK National Archives' PRONOM Registry
  • Consider the requirements of your selected data repository. If you intend to deposit your data in a data repository, it may have guidelines on how data should be structured and what file formats it will accept. Many institutions also provide file format recommendations and preferences based on content type:

File Naming Conventions

This is a set of conventions you define for naming data files and the folders you keep them in, and for saving multiple versions of files. Using naming/versioning conventions will:

  • Prevent accidental overwrites or deletion
  • Make it easier to locate specific data files
  • Preserve differences in the information content or functionality of different file versions
  • Prevent confusion if multiple people are working on shared files.

Below are some general guidelines for naming files and folders. While it is recommended that these guidelines are followed, it is most important that you ensure that:

  • conventions are defined and documented for your research project,
  • all members of the research team are aware of these conventions, and
  • conventions are followed consistently by all team members for the duration of the project.

General naming recommendations

  • Define a naming convention and be consistent using it, especially if multiple people are sharing files
  • Avoid "/ \ : * ? " < > [ ] & $ in names. These characters have specific meanings in your computer's operating system that could result in misreading or deleting these files
  • Use underscores (_) or hyphens (-), not spaces, to separate terms

Folder names

  • Keep names short, 15-20 characters or less
  • Use names that describe the general category of files the folder contains

File names

  • Keep names short, 25 characters or less
  • Use names that describe the contents of the file
  • Include a date using the format recommended by ISO 8601: YYYY-MM-DD
  • Do not include the folder name in the file name unless you are sharing files and there may be confusion about the location of a file

File versions

  • Include a version number at the end of the file name, such as v01. Change this version number each time the file is saved
  • For the final version, substitute the word FINAL for the version number
  • Turn on versioning or tracking in collaborative works or storage spaces such as Microsoft Sharepoint or Google Docs
  • Use a version control system such as Git to track versions of files, especially computer code

My Profile

Profile Photo
Leslie Delserone
She/her/hers
Contact:
If the calendar availability does not match your schedule, please email me to arrange a time.
Website
University of Nebraska-Lincoln Libraries
Hours of Operation | Parking Maps | Employment | Support the Libraries
318 Love Library | 13th & R | Lincoln, NE | (402) 472-9568
 @UNLLibraries |  @unl_libraries