Research Data Services, Expertise & Technology Solutions

You are here

Report on Dynamic Data Sets

Primary tabs

Traditional scientific publication has principally been in the form of peer-reviewed published articles.
Descriptions and acknowledgement of data sets was via reference in such articles. Today, data sets as significant contributions to science in themselves are becoming ever more important. This data must be citeable to encourage transparent, reproducible science, and to provide clear metrics for assessing the impact of research, which also drives funding choices. For example, linguistics research produces large volumes of information changing from reprocessing. In the case of transcript of a song, or OCR correction of newspaper corpora, there is frequent addition and reworking of material. In the future, massive crowd
sourcing will become more important, and studies of the state of knowledge as it was known at a particular instant will become important. A second class of examples arises from permanently operating observatories which supply nearly-continuous streams of data samples.

In both cases, data is published in real time, so data requesters see the effect of changes almost immediately. Often these data sets are “works in progress” in two ways: they are still growing, as new data arrives, and they are revised as missing data is recovered, or as new calibration values are applied. We call these “dynamic” data sets, DDS. In referring to a DDS the question arises – what exactly are we talking about? Is it the state of the data set at the time we saw it, or the time values were first recorded, or the time “now” for some later reader/visitor?

The Dynamic Data Working Group (WG) was tasked with considering common services and policies for which EUDAT might play a role of benefit to the research communities using and maintaining DDS. This broad mandate could be divided into two problem are as:

1. Assuming a server can accommodate requests for them, what sort of persistent identifiers are needed to support the citation goals, and

2. How can a data centre operator (a server) support these requests today and in the future?

Read more in the report.

Author(s) : 
J. Buck, S. Drude, P. Evans, J. Heikkinen, S. Johansson, M. Kemps-Snijder, A.Michelini, J. Misutka