Skip to Main Content

Data Management and Sharing: Metadata

Information for those who wish to know more about writing and implementing plans for data management and sharing

Metadata

Metadata are information about the context, content, quality, provenance, and/or accessibility of data. This is the critical information for ensuring the longevity and reproducibility of research data.

Metadata can exist in a variety of different formats. Examples are listed below:

  • Resource Discovery is locating, accessing, and retrieving varied and distributed data.
  • Resource Description is the differentiating of information and describing the characteristics of a resource.
  • Rights Management is the authorized access, display, and use permissions for objects to protect intellectual property rights, confidentiality, privacy, and security of information.
  • Preservation of data is the storing of the technical details on the format, structure and use of the digital content, history of all actions performed on the resource including changes and decisions, authenticity information such as technical features or custody history, and responsibilities and rights information applicable to preservation actions.
  • The Structural aspects of data facilitate direct access to key points in complex objects to aid the navigation and access to different parts of the same data set or data object.
  • Administration of the metadata captures the data's location, integrity, ownership, and authorship.

Metadata can exist in a variety of different formats. Here are a few of the most common:

Discipline Definition
Biodiversity The Darwin Core (DwC) is a standard designed to facilitate the exchange of information about the geographic occurrence of species and the existence of specimens in collections.
Geospatial Geospatial metadata commonly document geographic digital data such as Geographic Information System (GIS) files, geospatial databases, and earth imagery. It also can be used to document geospatial resources, including data catalogs, mapping applications, data models, and related websites.
Social Science Data Documentation Initiative (DDI) is an effort to create an international standard for describing data from the social, behavioral, and economic sciences. Expressed in XML, the DDI metadata specification now supports the entire research data life cycle.

If you are uncertain of what metadata standards may be in use in your discipline, the Research Data Alliance Metadata Standards Working Group maintains a directory of metadata standards by discipline, including related tools and use cases.

In addition, if you intend to deposit your data in a data repository, this repository may have guidelines on what metadata standard(s) should be used to describe deposited data.

 

Controlled Vocabularies

Controlled vocabularies are a collection of preferred terms that are used to retrieve content consistently. Predefined and authorized terms are mandated, in contrast to tags or keywords, which are not controlled, thus ambiguous and inconsistent. Taxonomies, thesauri, and ontologies are types of controlled vocabulary.

  • They facilitate searching and meta-analysis within a data set
  • They enhance the interoperability of data sets in repositories with data from multiple sources

How are controlled vocabularies used?

  • They are used in metadata records to express how content is organized so users know how to search for content
  • A more complex scenario would be using a published controlled vocabulary as a schema for your database. This could make it easier to deposit your data into a disciplinary repository that is based on the same vocabulary

Which vocabularies should I use?

In some fields, vocabularies are well-established, while in other disciplines, they are still emerging. You may want to check professional societies and journals for ones that have been developed in your disciplinary area. The list below is a starting point. JISC Digital Media has developed a more comprehensive Directory of Metadata Vocabularies that may provide guidance, as well.

Disciplinary area Example
Life Science Bioportal biomedical vocabularies from the NIH National Centers for Biomedical Computing
Geospatial Geographic Names Information System (GNIS) Developed by the USGS in cooperation with the U.S. Board on Geographic Names, contains information about physical and cultural geographic features in the United States and associated areas, both current and historical (not including roads and highways). The database holds the Federally recognized name of each feature and defines the location of the feature by state, county, USGS topographic map, and geographic coordinates.
Medical Medical Subject Headings (MeSH) is a controlled vocabulary for the purpose of indexing journal articles and books in the life sciences; created and updated by the US National Library of Medicine.
Agriculture The Agricultural Thesaurus Online vocabulary tools of agricultural terms in English and Spanish, cooperatively produced by the National Agricultural Library, USDA, and the Inter-American Institute for Cooperation on Agriculture.
Biodiversity Biocomplexity Thesaurus displays terminologies and term relationships in the fields of biology, ecology, environmental sciences, and sustainability.
Humanities Music Ontology provides main concepts and properties for describing music (artists, albums, tracks, arrangements).

If you would like assistance in finding, adapting, and using an appropriate vocabulary, one of the following specialists may be able to help.

  • A data curation consultant [email datamanagement (at) unl.edu]
  • Your department's subject librarian
  • An IT consultant in your department, school or college
University of Nebraska-Lincoln Libraries
Hours of Operation | Parking Maps | Employment | Support the Libraries
318 Love Library | 13th & R | Lincoln, NE | (402) 472-9568
 @UNLLibraries |  @unl_libraries