Preservation
Data preservation - a series of managed activities that ensure continued access for as long as necessary - is a top priority when planning new research. It is important that you preserve your research data from commencement of your project to allow for long term preservation once the research has been completed.
Funders, institutions and publishers will have strict requirements specifying how data should be preserved long-term. It is important to check these requirements on acceptance of funding.
Values underpinning the preservation of data are:
Short-term preservation of data during research will ensure that the data is safe, accessible and protected against any loss.
To prepare for this, your Data Management Plan (also covered in this toolkit) should include details of and instruction related to:
Long-term preservation involves the data being submitted to an appropriate repository for storage after the research is complete, ensuring both its security, accessibility and findability. See the 'deposit' tab for recommended repositories.
If some data is to be made publicly available then it is important that the data is findable and reusable. It is important that appropriate data description and citation is applied to the data as well as DOIs (Digital Object Identifiers) to datasets.
Consider using open source programs, such as Bagger or LOCKSS, to describe your data files. For more information or assistance in using these products, contact us.
What are Persistent Identifiers?
A persistent identifier (PI or PID) is a long-lasting reference to a document, file, web page, or other digital object. Most PIDs have a unique identifier which is linked to the current address, or location, of the metadata or content. Unlike URLs, PIDs are often provided by services that allow you to update the location of the object so that the identifier consistently points to the right place without breaking.
Common PIDs
An ORCID iD is a persistent identifier for a person. It provides a researcher with their own persistent digital identifier that will distinguish them from all other researchers. Anyone who participates in research or scholarly publication can register an ORCID iD for themselves free of charge. You can use the same iD throughout your career -- even if your name changes or you move to a different organization, discipline, or country.
A Digital Object Identifier (DOI) is a unique alphanumeric string that identifies content and provides a persistent link to its location on the Internet. While a web address (URL) might change, the DOI will never change. DOI numbers start with a 10 followed by a full stop and contain a prefix and a suffix separated by a slash: doi:10.xxxx/xxxxx. Often, a publisher assigns a DOI when an article is published and made available electronically and they are increasingly being used for final data sets.
How do I get a Digital Object Identifier (DOI) for my material?
You must use a service offered by a DOI Registration Agency (RA). See the list of RAs, and contact the ones whose services best meet your needs.
Archival Resource Key - ARK
An ARK identifier is a “specially constructed, globally unique, actionable URL" that that allow for descriptive metadata or data sets. It is represented by a sequence of characters (a string) that contains the label, "ark:", optionally preceded by the beginning part of a URL. E.G. http://example.org/ark:/12025/654xz321/s3/f8.05v.tiff. More information about ARKs.
Deposit of Research Data
When a research project has been completed an appropriate archive or repository needs to be selected for storage of the research data.
Prior to deposit, data should be prepared. This may involve cleaning and de-identifying the data and ensuring it is in an appropriate format, extending from survey data for example through to documentation and citation information. Some repositories will provide assistance with preparing the data for deposit and other data management tasks.
The choice of repository will depend on the data type and research discipline. The following finding tools will assist with this decision.
Repository Finding Tools
Repositories
The following list provides a selection of significant research data repositories.
Formats for Data Preservation
File formats need to be considered and decided upon before data collection commences. While they are usually dictated by the software you use it is often possible to opt for more than one format, for example .csv or .xls for a spreadsheet.
When settling on file formats for your data is is important to bear in mind that:
Examples of recommended file formats with universal application are:
Data types |
Formats |
Tabular data |
Comma-separated values (.csv) |
Textual data |
Rich Text Format (.rtf) |
Image data |
TIFF (.tif) |
Video data |
MPEG-4 (.mp4) |
Audio data | Waveform Audio File Format (.wav) |
Documentation and scripts |
Rich Text Format (.rtf) |
Authoritative guidance:
The following organisations provide guidance on preservation formats and best practice for data storage.
Retention guidance
Documentation
When retaining research data, it is important to document your decisions to assist with re-use. Documentation should cover how you captured data during your research, metadata applied, the software used for storage and analysis and the file formats that were selected.
In addition, it is wise to store a copy of any specific software used to help cover future software changes.
Good retention practice
The following practises will ensure your data is preserved and available to future researchers.
The UK Digital Curation Centre also provides guidance on retention, what to keep and what to delete.
Disposal of Digital Research Data
Data disposal (also called destruction or disposition) is the process of rendering your data unreadable. You may need to dispose of your data once your project is complete or has reached its retention period to ensure privacy and security, or comply with government or institutional regulations.
The below methods are best practices for data destruction, however, you may wish to contract a professional IT asset disposal company to ensure the destruction is completed. The contracted company should by compliant with relevant information privacy laws and provide a certificate of destruction.
Disposal of Physical Research Data
Placing physical research data items in a physical bin or shredding them does not ensure that they are adequately disposed of.
Monash Health has secure document disposal bins that can be used to dispose of paper-based research data appropriately.
Monash Health acknowledges the Traditional Custodians of the land, the Wurundjeri and Boonwurrung peoples, and we pay our respects to them, their culture and their Elders past, present and future.
We are committed to creating a safe and welcoming environment that embraces all backgrounds, cultures, sexualities, genders and abilities.