NGDC cited data

Data © iStockphoto

The rationale behind cited data:

  • get credit for your work
  • publish in data journals
  • show the value of your data
  • ensure your data will be available in future

There is increasing demand from the scientific community for a strong linkage between papers published in the scientific literature, and the data upon which they are based, and for a mechanism to reward data collection through citation.

NGDC cited data catalogue

A list of all the formally cited datasets held by the NGDC showing the title, author(s) and the digital object identifier (DOI) which links to the landing page with metadata links and direct access to the data where appropriate.

Citing datasets

The National Geoscience Data Centre (NGDC) now has the ability to issue a digital object identifier (DOI) to any datasets it holds that meet certain rigorous management criteria. This is a result of collaboration between the NERC data centres, the British Library and DataCite.

The DOI allows scientists to cite datasets in the same manner as a scientific journal article, enabling credit to be assigned to the dataset creators, and ensuring the discoverability, permanence and stability of the dataset. This recognises the value of the data and the effort that has gone into its creation, capture and effective management. DOIs allow formal publication of the dataset in data journals.

Datasets must be fully ingested into the data centre before a DOI can be minted. In exceptional cases, a DOI can be reserved for minting later (for example when a DOI is required for a dataset which forms the basis of a journal publication). Legacy datasets which have already been ingested into a data centre may also be assigned DOIs.

For a dataset to be assigned a DOI, it must be provided to the Data Centres in good condition, with appropriate metadata and of a suitable level of technical quality. The dataset submitter will be responsible for ensuring the data meets the required level of quality. Details of the minimum requirements for data are provided in the Guidelines for Scientists with further information provided by the relevant sector specific data centre.

NGDC data citation process

When the NGDC assign a DOI to a dataset, it is providing certain assurances to the subsequent data user, that the dataset cited is:

  • stable, i.e. not going to be modified
  • complete, i.e. not going to be updated
  • permanent — by assigning a DOI the NGDC is committing to make the dataset available for the foreseeable future.
  • good technical quality — by assigning a DOI the data centre is giving its stamp of approval, saying that the dataset is complete and that all the necessary metadata are available.

Therefore when a dataset is assigned a DOI, the NGDC confirms that:

  • the dataset will be available for the foreseeable future
  • there will be bitwise fixity of the dataset
  • there will be no additions or deletions of files/records
  • there will be no changes to the directory structure in the dataset 'bundle'
  • upgrades to versions of data formats will result in new editions of dataset

The NGDC will provide a full catalogue page (landing or splash page) which will appear when any user clicks on the DOI hyperlink.

Modifications and versioning

Once a dataset has been deposited with the NGDC and a DOI issued the dataset cannot be modified. If there are updates or changes to the dataset a new version of the dataset will need to be deposited and the Data Centre will:

  • assign a new version number — a simple integer sequence only
  • assign a new digital object identifier (DOI)
  • create a new landing page for the new version of the dataset that includes its full version history
  • modify the landing page of the previous version of the dataset to provide a link to the new version
  • store the new dataset in addition to previous versions

Ingestion procedures

The NGDC will accept data according to NERC Data Policy and the NERC Data Value Checklist or NGDC Data Value Checklist depending on which is most appropriate. It will also ensure that the data meets the NGDC collection policy. Guidance is available at ESAA Guides and Documentation.

Dataset standards

One objective of data management within NGDC is to ensure that data can be reused with confidence decades after collection and without the need for any kind of communication with the scientists who collected that data.

The following good practice adopted across all the NERC Environmental Data Centres, must be met for a dataset to be accepted.

  • The format must be well documented and conform to widely accepted standards.
  • The format must be readable by tools that are freely available now and are likely to remain freely available in the future.
  • Data files should be named in a clear and consistent manner throughout the dataset, with filenames (rather than pathnames) that reflect the contents and uniquely identify the file. Filename extensions should conform to appropriate extensions for the file type. Filenames should be constructed from lower case letters, numbers, dashes and underscores and be no longer than 64 characters.
  • Parameters in data files should either be labelled using an internationally recognised standard, or by local labels that are accompanied by clear, unambiguous plain text descriptions.
  • Units of measure must be included for all parameters and clearly labelled.
  • Data must be accompanied by sufficient usage metadata to enable its reliable reuse. Some of this may be embedded within the data files. If not it should be included as additional documents.

The technical experts in the NGDC are responsible for ensuring that the dataset meets the required level of technical quality before a DOI can be issued to it.

Contacts

For DOI assignment please contact the National Geoscience Data Centre (NGDC) via Enquiries.

For further information on the citation process or deposit of research datasets please contact Rod Bowie.