Data standards

What are Data Standards?

Data standards are a set of rules or guidelines that ensure data are formatted consistently across different datasets. They provide a framework for the information that should be captured during data collection, enabling data to be interoperable (able to be used across different platforms or systems) and reusable (easy to access and apply for different purposes). By following data standards, data providers can ensure that their information is understood by various users and integrated with other datasets across national and international data platforms, maximising their use as evidence in environmental monitoring, assessments and reports that inform policy, and in scientific research.

Why are they important?

Data standards are crucial because they ensure that data is collected in a standardised way, which supports sharing, interpretation, and analysis. They promote good data management practices by providing clear guidelines and checklists for data collection, ensuring that essential information is always captured, reducing errors, and facilitating the process of data submission and archiving.

Standardised data supports the FAIR data principles:

  • Findable: Standardising data and metadata allows them to be incorporated into national and international data aggregators so they can be found on a wider range of platforms and data/metadata portals.
  • Accessible:  data that complies with metadata standards will list accessibility requirements e.g. licensing information, allowing users to be informed about any access and reuse restrictions at an early stage.
  • Interoperable: Standardised data can be easily reused by others without the need for reformatting or reinterpretation, minimising the risk of confusion and errors.  Data interoperability also enables the aggregation of a large number of datasets, maximising its ability to inform scientific research, policy, and global initiatives such as the UN Ocean Decade, the Marine Strategy Framework Directive, and other global environmental monitoring efforts.
  • Reusable: Standardising data makes it easier to reuse it, as all information required for interpretation should be included and it will be in a consistent format.  This can save time and resources spent on finding, reformatting and interpreting data, but can also prevent duplication of effort where data already exist and are reusable.

How do you apply them?

Applying data standards involves several key steps to ensure your data is formatted in line with predefined guidelines. Here's how to apply them:

  1. Use Controlled Vocabularies: Controlled vocabularies are lists of standardised terms or definitions used to describe the data, ensuring consistency in how information is recorded. By following controlled vocabularies, you remove ambiguity from the data and make it easier for others to understand and reuse it. For example, if you’re collecting species data, using a controlled vocabulary for species names ensures everyone refers to species in the same way.
  2. Follow Specific Schemas: A schema is a framework or template for organising data according to specific standards. To apply data standards, you need to structure your datasets according to the chosen schema. For example, if you are using MEDIN guidelines, your data needs to be formatted according to the MEDIN schema, with all mandatory fields completed based on the type of data you are submitting.
  3. Metadata Standards: Metadata is data that describes other data. Applying metadata standards like the MEDIN Discovery Metadata Standard ensures that all mandatory information that describes the dataset is captured in a consistent way, e.g. time, location, methods for data collection and licensing information.   

Data Standards used by DASSH

DASSH (UK Archive for Marine Species and Habitats Data) supports the following data standards for marine and biodiversity data:

  • MEDIN Data Guidelines: A key standard for marine data, covering a wide range of environmental themes. 
  • MEDIN Discovery Metadata Standard: A metadata standard which also complies with UK GEMINI, ISO 19115/19139, and EU INSPIRE standards.
  • Darwin Core: A widely-used international standard for biodiversity data, particularly for data shared with platforms like NBN Atlas, OBIS, GBIF and EMODnet.  MEDIN compliant datasets can be converted to Darwin Core-Archive for onward sharing with these platforms.

DASSH offers guidance and training on applying these data and metadata standards. Contact the team at dassh.enquiries@mba.ac.uk if you would like support or advice.