5 Research Data Storage Problems (and Tips) in Research Data Management

At first glance, storing your research data should not be a difficult task with all the options we have at our disposal; you’ve got your USBs, external hard disks, cloud drives, or even on websites or their institutional repositories. However, the most convenient ways to store research data may not always be the most appropriate one. Check out below what are the main problems with storing research data incorrectly.

1. Difficulty in finding data

You’ve done the hard work of producing your research. Your results are captured in data. What a waste it would be if your work cannot be found by fellow researchers, right? Every day, researchers both inside and outside of your community, domain and country may be searching for research data that could help their own. Perhaps research and development departments of the private sector are also looking for your work. If your work is on offline devices, in a siloed institutional repository or even a webpage that’s not search engine optimised, then your data will remain hidden and not used to its maximum potential.

To solve this, you need a research data storage solution that allows others to easily find the data. Your solution needs to be able to have your research data be described with human- and machine-readable metadata, assigned a globally unique and persistent identifier, and have this metadata registered and searchable.

2. Data loss or unauthorized access

Incorrectly-stored research data can be susceptible to data loss or unauthorized access. Corrupted laptop or PC hardware, lost USBs and hard drives, spilled coffee, hackers, ransomware, viruses, social engineering - these are things you never intend to happen of course, but are not impossible. All these can compromise your data and can result in lost time and effort, lost resources, and even reputation - think of the ramifications for sensitive data or effects on future funding.

A research data storage solution should be secured against unauthorised access as well as accidents or disasters. Your solution should allow your research data to be stored based on policies and enable open access through a federated identity system. For research data that’s still evolving, you need a solution that’s flexible enough for making changes or updating your research data. If data is already finalised and stable, you need a data archiving system that ensures that if your primary storage goes down, then you have a way of restoring data in case of accidents.

3. Barrier for interoperability

Combining and integrating your research data with other research data will generate further knowledge. Correct storage enables this. Research data should be readable for machines without the need for specialised or ad hoc algorithms, translators, or mappings, but if your storage system cannot support this, then you may be limiting how far your research data can be used.

For research data to be interoperable, your research data storage system should support the use of commonly used controlled vocabularies, ontologies, thesauri, resolvable globally unique and persistent identifiers, and a good data model.

4. Lack of data persistence

Moving files from one storage space to another commonly needs to happen whether in research data or even for personal files. However, the difference between personal files and research data is that other people are interested in finding and making use of your research data. If data is moved, links are broken and your research data can no longer be found. That's a lost opportunity for your research data to be reused. What solves this is data persistence - this means that even if you move a file to another place, the reference or Persistent Identifier (PID) will still point to the same research data.

To achieve persistence for your research data, you need a research data storage solution which supports the sharing of your research data PIDs. Even in the most popular cloud storage systems today PIDs are not part of their features. Make sure that your storage solution has a way to support the assignment of PIDs.

5. Long-term preservation is lacking

Once research data is created, it must be available for the future. This is not just for the sake of ensuring researchers in the future can make use of your outputs. It could also be a “policy compliance” matter. Depending on your institution, community or ministry/national policies, you may be required to implement long-term data preservation and ensure that your research data is stored in an accessible place for 10 or even more years!

To ensure long-term preservation of your research data, you need to ensure that you have a data management policy that takes into account best practises and a compatible data archiving system. This gives you the tools to follow procedures and guarantees your data is safe and will be available for future reference.

Conclusion

Research data management done right dictates that storing research data must enable data sharing, easy findability and access when needed. It must allow reuse by others and protected against loss. All this starts with a good data management policy and the right tools (storage features) which should be compatible and act as a reflection of your effective data management policy.

Did you know that the EUDAT Collaborative Data Infrastructure (EUDAT CDI)
has services that cover every step of the research data lifecycle?

Check out our 30-minute webinar and slides,
Introduction to the EUDAT CDI and its Services.