European Commission Vice-President for the Digital Agenda Neelie Kroes recently emphasised that “data is the currency of modern science”.

Extending the metaphor of “data as currency” with the concept of “data objects as money”, then trusted data repositories can be considered as “banks” and thus guardians of the “currency”. This requires the existence of solid agreements that bind all actors involved in these “currency” transactions to guarantee a trustworthy environment for exchange. In such a context, trusted relations between the various actors must be established and context-sensitive trust measures need to be put in place which are indispensable for the global market of knowledge exchange.

In order to maintain the “value of data” the following aspects need to be considered by trusted data repositories:

  • Researchers’ data need to be stored, preserved and curated in a safe environment which prevents changes of immutable objects and stores changes to mutable ones â??this ensures that the user will indeed get the object he is searching for or can understand what happened to it.
  • Stored data need to be made discoverable and referable through unique identifiers and metadata encouraging reproducible science.
  • The means to foster data sharing within individual scientific communities (and even across disciplinary and geographical boundaries) need to be provided -this type of sharing will facilitate the exchange of thoughts and ideas, and thus improve the quality and impact of the outcome of scientific research.
  • Stored data needs to be open and publicly accessible so that interested citizens can also make use of such data.
  • Legal and ethical rules that govern access to data need to be adhered to, so as to prevent inappropriate manipulation and misuse of data.

 

Trust is Key

As spelled out in the “Riding the wave” report on scientific data, ”trust” that characterizes a relationship between two parties is a key concept for data infrastructures. Traditionally a scientific creator and a consumer of data establish a personal relationship to make use of each other’s data. Nowadays, there are domains where this kind of direct personal mutual trust works, but the trend is shifting towards indirect relations. Add to this that more frequently new players are entering the scene in the form of data managers or curators, which then augments the complexity of relations between data creators and consumers. The increased use of automated operations that create, manage and process ever-growing data volumes also requires new types of organisational solutions.

Personal trust needs to be extended with new mechanisms and policy-based organisational trust.

  • Data creators need to trust that data managers will take care of proper data lifecycle management over a long period of time and that appropriate credit is given to the creators of the stored data.
  • Data managers need to rely on the creators trusting that they have produced the data according to appropriate scientific specifications, with usable metadata information to ensure interpretability, and to enable re-use if there are no legal or ethical restrictions.
  • The data consumer needs to rely on the data manager supporting data integrity, authenticity, accessibility and citability for a long period of time.

Due to the importance of trust, a number of factors come into play when considering the cross-disciplinary and cross-country data management situations being tackled by EUDAT and other scientific data initiatives.

  • In Europe, there are many different countries and traditions and, as a consequence of the different cultures, there are variations in ethical rules and legal systems. These legal differences make it harder to establish pan-European trust relations. In general, it would be much easier to establish trust in a coherent cultural area, but nevertheless there is much effort to overcome these difficulties in a unified Europe.
  • Even within one cultural area (such as a particular country) there are grades of trust which may have to do with differences in capabilities, community affinity or sensitivity towards researchers.
  • Another important aspect for trust is the persistence of any offer to researchers. In Europe the landscape is quite heterogeneous with different types of organisations offering data storage and management facilities: (1) Research organisation centres which, in general, have long-term funding, (2) national data centres with long-term funding, (3) European-level research centres with a long-term offer for discipline data, and (4) traditional institutions such as libraries, museums and archives which are extending their traditional mandate to offer digital data facilities.
  • The adoption of widely accepted standards or best practices will increase trust and reduce costs.
  • Certification of repositories is another indispensable means to increase trust in the future.

 

The EUDAT Approach

EUDAT anticipated this heterogeneous landscape as the project was initially inspired by the “Riding the wave” report from the European Commission’s High Level Expert Group on Scientific Data. There are multiple stakeholders with different specializations, service offers and business interests in the areas of data management, storage, sharing, (data and metadata) harvesting, access provisioning (as well as data mining), and big data analytics. This landscape will change over time due to better understanding of how data can be modelled, treated and used, and as a consequence of new emerging technologies and a converging process of understanding. Therefore, basic elements of trust in the EUDAT project have been established and include different types of services according to different stakeholder involvement, for example:

  • B2SAFE (safe replication in its different flavours) relates community data managers (acting on behalf of the creators) with EUDAT data managers.
  • B2SHARE (simple store) is a trusted repository that relates researchers as depositors (probably also as the creators) and consumers with EUDAT data (repository) managers.
  •  B2NOTE (semantic annotation) relates EUDAT software managers with community data managers and arbitrary users.

Furthermore, EUDAT is based on an open network of sustainable centres with strong national or organisational roles that provide data services on a long-term basis and which are compliant with organisational and technical agreements. EUDAT will require all participating (data) service providers to comply with specific, auditable Service Level Agreements (SLAs). EUDAT will cooperate with community data centres to offer complementary services to ensure mutual benefit. Many of the EUDAT centres have already established liaisons with certain research communities. By making such primary agreements between a centre of choice and a community, a special trust relationship is intensified and a solid sustainability basis is created. Adoption of open and widely-used procedures and standards will also improve transparent collaboration and guarantee long-term sustainability.

Whatever you're trying to do these days, the answer may well lie in data. Whether you want to unlock the human genome – or open up government. Whether you're trying to predict the economic future—or decrypt a foreign website. Whether you're trying to locate a traffic jam or a Higgs boson: big data tools will be helping you. This data revolution, moving to a data-driven economy, won't happen by itself. It needs a helping hand, and the right framework. Data needs to be freely available for use and re-use. It needs to be easy to transport and inter-operate — without different rules and standards for every country and dataset. And it needs the framework that safeguards privacy and builds trust. Neelie Kroes, Vice-President of the European Commission responsible for the Digital Agenda - The big data revolution