Skip to Main Content
Monash Health Library


Click here to chat with a librarian

Preservation

Data preservation - a series of managed activities that ensure continued access for as long as necessary - is a top priority when planning new research. It is important that you preserve your research data from commencement of your project to allow for long term preservation once the research has been completed. 

Funders, institutions and publishers will have strict requirements specifying how data should be preserved long-term. It is important to check these requirements on acceptance of funding.

Values underpinning the preservation of data are:

  • Unique data should be stored in such a way it cannot be replaced or replicated
  • Data should be verified by the researchers as authoritative and correct to support sound research
  • Legal requirements are complied with, eg. copyright

Short-term preservation of data during research will ensure that the data is safe, accessible and protected against any loss. 

To prepare for this, your Data Management Plan (also covered in this toolkit) should include details of and instruction related to:

  1. Data Backups - when they will be scheduled, who will be responsible for them and how they will be stored
  2. Data Security - setting of appropriate access controls to your data especially when multiple researchers are involved
  3. Data Findability - planning for your data to be found and understood, eg. a plan for unique document identifiers, the metadata you will record, whether the collection will be indexed and/or searchable.

Long-term preservation involves the data being submitted to an appropriate repository for storage after the research is complete, ensuring both its security, accessibility and findability. See the 'deposit' tab for recommended repositories.

If some data is to be made publicly available then it is important that the data is findable and reusable. It is important that appropriate data description and citation is applied to the data as well as DOIs (Digital Object Identifiers) to datasets.

Consider using open source programs, such as Bagger or LOCKSS, to describe your data files. For more information or assistance in using these products, contact us.

What are Persistent Identifiers?
A persistent identifier (PI or PID) is a long-lasting reference to a document, file, web page, or other digital object. Most PIDs have a unique identifier which is linked to the current address, or location, of the metadata or content. Unlike URLs, PIDs are often provided by services that allow you to update the location of the object so that the identifier consistently points to the right place without breaking.

Common PIDs

 

An ORCID iD is a persistent identifier for a person. It provides a researcher with their own persistent digital identifier that will distinguish them from all other researchers. Anyone who participates in research or scholarly publication can register an ORCID iD for themselves free of charge. You can use the same iD throughout your career -- even if your name changes or you move to a different organization, discipline, or country.


 

A Digital Object Identifier (DOI) is a unique alphanumeric string that identifies content and provides a persistent link to its location on the Internet. While a web address (URL) might change, the DOI will never change. DOI numbers start with a 10 followed by a full stop and contain a prefix and a suffix separated by a slash: doi:10.xxxx/xxxxx. Often, a publisher assigns a DOI when an article is published and made available electronically and they are increasingly being used for final data sets. 

How do I get a Digital Object Identifier (DOI) for my material?

You must use a service offered by a DOI Registration Agency (RA).  See the list of RAs, and contact the ones whose services best meet your needs. 


 Archival Resource Key - ARK 

An ARK identifier is a “specially constructed, globally unique, actionable URL" that that allow for descriptive metadata or data sets. It is represented by a sequence of characters (a string) that contains the label, "ark:", optionally preceded by the beginning part of a URL. E.G.  http://example.org/ark:/12025/654xz321/s3/f8.05v.tiff. More information about ARKs.

Deposit of Research Data

When a research project has been completed an appropriate archive or repository needs to be selected for storage of the research data.

Prior to deposit, data should be prepared. This may involve cleaning and de-identifying the data and ensuring it is in an appropriate format, extending from survey data for example through to documentation and citation information. Some repositories will provide assistance with preparing the data for deposit and other data management tasks.

The choice of repository will depend on the data type and research discipline. The following finding tools will assist with this decision.


Repository Finding Tools

  • Databib.org - a free annotated bibliography and catalogue of research data repositories
  • FAIRsharing.org - identify repositories that are available for specific data or discipline
  • GHDx - Global Health Data Exchange is a data catalogue of surveys, censuses, statistics and other public health and global health data
  • re3data.org - Registry of Research Data Repositories is an open science tool that offers an overview of international repositories
  • Repository Finder - a tool hosted by DataCite to query the re3data repository for repositories relevant to FAIRsFAIR Project

Repositories

The following list provides a selection of significant research data repositories.

  • The Australian Data Archive (ADA) : a national service for the collection and preservation of digital research data
  • Clinical Study Data Request : facilitates the sharing of patient level data from clinical study sponsors and funders
  • DANS : Data Archiving and Networked Services to deposit research data, search for datasets and research projects and provide education on RDM
  • Dryad : nonprofit membership organisation who assess files for quality control and ensure best practice is followed
  • Figshare : a flexible open access repository where any file format may be uploaded and shared either privately or made public
  • Harvard Dataverse : repository which includes medicine, health and life sciences
  • Health and Medical Care Archive : (HMCA) preserves and disseminates data collected by health and healthcare research projects funded by the Robert Wood Johnson Foundation (RWJF)
  • Oracle Healthcare Data Repository : repository that supports the exchange of healthcare related information
  • Physionet : established under the National Institutes of Health (NIH) PhysioNet offers free access to large collections of physiological and clinical data and related open-source software
  • Sicas Medical Image Repository : Swiss based SICAS acquires and stores medical images and processes data for research and applications in medicine
  • WHO Global Health Observatory Data Repository : provides access to over 1,000 indicators on priority health topics

Formats for Data Preservation

File formats need to be considered and decided upon before data collection commences. While they are usually dictated by the software you use it is often possible to opt for more than one format, for example .csv or .xls for a spreadsheet.

When settling on file formats for your data is is important to bear in mind that:

  • that file formats can become obsolete - where possible retain multiple formats to reduce risk of loss;
  • the longevity of compatible software and hardware.

Examples of recommended file formats with universal application are:

Data types

Formats

Tabular data

Comma-separated values (.csv)
Tab-delimited (.tab)
SPSS portable format (.por)

Textual data

Rich Text Format (.rtf)
Plain text, ASCII (.txt)
eXtensible Mark-up Language (.xml)

Image data

TIFF (.tif)

Video data

MPEG-4 (.mp4)

Audio data Waveform Audio File Format (.wav)

Documentation and scripts

Rich Text Format (.rtf)
PDF/UA, PDF/A or PDF (.pdf)

 


Authoritative guidance:

The following organisations provide guidance on preservation formats and best practice for data storage.

Retention guidance

  1. Start with a digital preservation strategy in your Data Management Plan (DMP)
  2. Think long term from the beginning
  3. Check if you have specific funding, institutional, or legal requirements
  4. Decide on where your data will be stored in both the short and long term - cloud? repository? secure network?
  5. Have a system for organising and saving data that includes unique identifiers and metadata
  6. Chose a suitable storage system that includes back ups
  7. Ensure you use durable file formats, software and hardware

Documentation

When retaining research data, it is important to document your decisions to assist with re-use. Documentation should cover how you captured data during your research, metadata applied, the software used for storage and analysis and the file formats that were selected.

In addition, it is wise to store a copy of any specific software used to help cover future software changes.


Good retention practice

The following practises will ensure your data is preserved and available to future researchers.

  • Copy data files to new media every 2 to 5 years
  • Check the data integrity of stored files at regular defined times
  • Have a storage strategy that includes two different forms of storage to safeguard against loss
  • Digitise any paper documents
  • Ensure the storage environment is suitable and fit for the purpose
  • Have appropriate security in place for physical, cloud based and repository systems and audit user access periodically

The UK Digital Curation Centre also provides guidance on retention, what to keep and what to delete.

Disposal of Digital Research Data

Data disposal (also called destruction or disposition) is the process of rendering your data unreadable. You may need to dispose of your data once your project is complete or has reached its retention period to ensure privacy and security, or comply with government or institutional regulations.

The below methods are best practices for data destruction, however, you may wish to contract a professional IT asset disposal company to ensure the destruction is completed. The contracted company should by compliant with relevant information privacy laws and provide a certificate of destruction.

  • Overwriting
    • Overwriting is the process of writing new data on top of the old
    • You may require overwriting several times if it is sensitive data
    • Overwriting large amounts of data takes a long time
  • Degaussing
    • Degaussing is the process of using a magnet to disrupt the magentic field of the storage device
    • Degaussing does not work for non-magnetic storage devices, such as flash storage and CDs/DVDs
  • Physical destruction
    • Physical destruction involves physically damaging or destroying the device so it no longer works. This can be done by melting, shredding, etc
    • This process is prone to human error and data may still be recoverable

Disposal of Physical Research Data

Placing physical research data items in a physical bin or shredding them does not ensure that they are adequately disposed of. 

Monash Health has secure document disposal bins that can be used to dispose of paper-based research data appropriately.

Monash Health acknowledges the Traditional Custodians of the land, the Wurundjeri and Boonwurrung peoples, and we pay our respects to them, their culture and their Elders past, present and future.

We are committed to creating a safe and welcoming environment that embraces all backgrounds, cultures, sexualities, genders and abilities.