Sustainability – having the resources and the policy framework to deliver services and support users into the future – is at the heart of EUDAT’s mission of designing, implementing and offering common data services and infrastructure for research.
Future proofing is a key issue for all of Europe’s research infrastructures and e-infrastructures. A plenary session on sustainability at EUDAT’s Third Conference in Amsterdam on 24 – 25 September 2014, brought together experts from across Europe to discuss and debate what sustainability should encompass and how it can be achieved.
Three particular features of modern science make it imperative to create a shared vision of sustainability. One is the dramatically increasing volume of data; the second is the requirement to collaborate across disciplines and across geographies to extract the true value from this data; the third is that all scientific knowledge and the scientific literature, is now produced in a digital format that must be archived in computer storage systems and made accessible for the future.
The sustainability panel discussion was preceded by David Rosenthal who started the LOCKSS (Lots Of Copies Keep Stuff Safe) program at Stanford University Libraries in California, US. The LOCKSS program is an open-source, library-led digital preservation system that applies the traditional purchase-and-own library model to electronic materials. The LOCKSS system enables librarians at each institution to take custody of and preserve access to the e-content to which they subscribe, restoring the print purchase model with which librarians are familiar. Rosenthal gave a fascinating talk about some of the challenges associated with long-term data preservation and argued that people's expectations are often far out of line with reality. "It isn't possible to preserve nearly as much as people assume is already being preserved, nearly as reliably as they assume it is already being done," he writes on his blog. "This mismatch is going to increase. People don't expect more resources yet they do expect a lot more data. They expect that the technology will get a lot cheaper but the experts no longer believe it will."
Opening the discussion, Rob Baxter of the Edinburgh Parallel Computing Centre and EUDAT member, outlined the three-fold approach that EUDAT has taken in considering sustainability. The first covers organisational aspects of ensuring the continuation of EUDAT’s services. Working with more than 30 scientific communities, EUDAT has implemented five services that together provide the research community with the means to manage its data via the EUDAT e-infrastructure, which is managed by a network of European, national and institutional centres. As Baxter noted, while this is very well-organised, it is also informal. “The question is, how can we maintain that sort of flexibility while formalising things to increase strength and redundancy,” he said.
The same considerations apply to the second aspect of sustainability, which is at the technical level. EUDAT links together disparate centres and infrastructures, and to ensure coherent operations, open source standards must be adopted. “This has to fit into existing working practices; there needs to be a pluggable framework,” Baxter said. The third pillar of sustainability, ensuring financial security, requires the development of future business models. The introduction of charges for services would have to be balanced against the need to maintain trust. And while EUDAT is funded by the European Union, the problem of financial sustainability must obviously be addressed within the context that 90 per cent of research funding comes from Member States, said Baxter.
The CRISP project – Cluster of Research Infrastructures for Synergy in Physics – brings together eleven European generators of scientific data including CERN. “It’s a broad project, focussed on real physical research infrastructures,” noted Laurence Field, CRISP Data Management leader. While focussing in the needs of the eleven physics institutes, these organisations also provide services to people in other disciplines, meaning CRISP has to address the challenges faced by the whole research community. Most of the external researchers are not physicists they could be chemists or biologists, for example. This interaction with users from different disciplines has enabled CRISP to build a picture of the shared challenges that cut across scientific domains, which resulted in a collective paper from a series of cluster projects. Field said. Through its interaction with the EIRO Forum, CRISP scoped some possible funding models, ranging from sponsored peer review to pay-as-you-go. “The EIROFORM paper, published 6 months ago, where four of the eleven CRISP RIs are also in this forum, outlines a vision set out for a European e-infrastructure for the 21st century. This paper could provide inspiration for sustainability for others,” Field suggested.
Anton Ellenbroek of the UN Food and Agriculture Organisation and Board Secretary of iMarine, a large-scale project that is putting in place an international infrastructure and associated tools to underpin an ecosystem-level approach to fisheries and the management of living natural resources, reminded delegates of the cost of not taking a sustainable, coordinated approach.Fisheries generate turnover of €274 billion per annum and employ more than 200 million people, often in marginal economies. “We need stock to be well-managed; there is a huge economic need and a huge social need,” Ellenbroek said. iMarine has set up a technology backbone and established data collaborations, offering online tools and connectivity that are particularly important for developing countries in monitoring stocks and ensuring fisheries are well-managed.Sustainability of fish stocks requires sustainability of this infrastructure and Ellenbroek highlighted two particular challenges. One is getting people to use iMarine’s resources and tools to build value-added services, given that as things stand, iMarine has no guarantee of future funding. The second is keeping track of people and keeping them engaged when they move from one project to another, or change jobs. iMarine too, is looking to develop a sustainable business model in which it is hoped to secure public funding to guarantee the physical infrastructure, and then use this platform develop and deliver paid-for commercial services. One example would be to develop a traceability service for fish products.
It is critical that there is not a one-size-fits-all approach to sustainability: research infrastructures need to maintain the focus on their area of expertise, whilst working together in a sustainable way, as Zhiming Zhao of the ENVRI project (Common operations of environmental research infrastructures) emphasised.
Environmental research infrastructures, covering the earth from its mantle to the upper reaches of its atmosphere, are very disparate in terms of their outputs. They also generate huge volumes of data that must be shared and analysed by researchers across a range of disciplines, from geology to climate science, to build a holistic view of the earth and its environment. “The goal is to coordinate and let researchers collaborate together at a systems level,” Zhao said. Maintaining the interoperability services ENVRI has put in place once the project ends in October 2014, has been addressed in a sustainability plan. There are three elements to this. At a policy and organisational level, the ENVRI Stakeholder Advisory Board that met during the project will continue to hold regular meetings. The second strand will be to promote the adoption of the ENVRI Reference Model, which provides a common ontological framework and underpins the development of common mechanisms for data discovery, data access and data processing across the community. Thirdly, the ENVRI sustainability plan looks at future interactions between the environment research infrastructures and the e-infrastructures such as EUDAT, EGI and PRACE.
Like ENVRI, the DASISH project is developing a common architecture to allow five of Europe’s large social sciences and humanities research infrastructures to find common, sustainable answers to shared problems, as Hans Jørgen Marker, Director of the Swedish National Data Service and DASISH Coordinator described. Three years in, the project is making good progress in defining the reference architecture and in providing tools, for example, for language translation and harmonised coding for occupations. DASISH has also assessed current rules for preservation and curation of data across the various infrastructures and work is well-advanced on defining services for depositing data and its management, across social sciences and humanities. “Solutions have been produced fairly smoothly. Some can go on the shelf; other reach into the future and need maintenance in some way,” Marker said. All members of DASISH are on the ESFRI Roadmap and one key lesson has been the difference between them in terms of what a research infrastructure looks like. This highlights how requirements for sustainability vary, but also points to the central role of the culture of individual disciplines and organisations. “The people, the traditions, the knowledge, the ways of working: that kind of infrastructure is not easily built. You need the same organisations functioning for the five years it takes to build the organisation in the first place,” said Marker.
BioMedBridges is another ambitious collaborative project, in which 12 biomedical sciences research infrastructures covering fields stretching from biobanks and clinical trials to mouse models and contagious agents, are working to build shared e-infrastructure to allow data integration. This will underpin future sustainability and also preserve the value that is inherent in the data, by allowing for its re-use, combination and analysis in many different contexts, now and in the future.
As Stephanie Suhr of the European Bioinformatics Institute and Project Manager of BioMedBridges noted, “sustainability embraces political, social, technical and financial, elements, and in the case of sensitive or personally identifiable data, such as in biomedical research or social sciences, also legal and ethical requirements. Data sharing must be achieved in the face of the explosive growth of life sciences data and the increasing cost of storage. It is also necessary to accommodate the fact that generation of a large part of biomedical data, for example of medical images, will be increasingly widely dispersed as imaging technology develops and more sophisticated machines become available for example at hospitals. Necessary expertise to curate the data may not be available locally while the transfer speed can be many times lower than the speed with which the data is generated, making connectivity and network capacity a further bottleneck,” Suhr said.
The explosion in the data generation capacity of scientific equipment and sensors is creating a new class of researchers who make different demands in terms of their use of computing power and of how and where their data is stored, said Alison Kennedy, Executive director of the Edinburgh Parallel Computing Centre, who is also a board member of PRACE and EUDAT. “Traditionally users needed us to develop tools to generate data – for modelling and simulations, which had to be kept to compare with other models. Now we have a completely different set of users who want to analyse data generated elsewhere,” Kennedy said. These users tend not to have such a strong background in computing and Kennedy said it is really important to understand their requirements, in particular, how the data will be used, preserved and stored over the longer horizon.
In his role in business development in Dante and GÉANT, Roberto Sabatino works with international scientific users to help them utilise the scientific computing infrastructure of GÉANT – currently connecting over 50 million users at 10,000 institutions across Europe at speeds of up to 500Gps - to its full potential. Sabatino agreed that the rise of big data is changing the practice of research and creating a new type of user with different requirements. Given this, ensuring the sustainability of research data into the future requires greater interchange between e-infrastructures and research infrastructures. “We need stronger cooperation between the infrastructures to understand users’ needs,” Sabatino said.
While the e-Infrastructures need to respond to their users, users also need help and advice on what data to store and how, believes Yannick Legré, director of the European Grid Infrastructure, which provides computing and data services to support the single European Research Area. “We have to work with and support users and scientists, to inform and define with them what data to store, what needs to be made sustainable, and agree on what can be reproduced,” Legré said. This points to another aspect of sustainability, which is improving the energy efficiency of computing systems and reducing their environmental impact. “We must be sustainable in terms of energy use. We are trying in computation to move the processing power to the data set, rather than moving the data to the processor,” Legré said.
It’s clear that whatever technical, policy and organisational elements must be factored in, the core of sustainability is trust. The whole of research is in a time of flux, in which computing is changing the practice of science and re-writing the scientific method. Many of these issues are exemplified in the OpenAIRE and OpenAIREplus projects, which underpin the move to Open Access to all scientific data generated by the European Union’s publicly-funded research programmes. Paolo Manghi of ISTI CNR and OpenAIRE said OpenAIRE is moving on from the pilot phase to providing a service for researchers to deposit Framework Programme 7 and ERA funded research in Open Access Repositories. “We’ve got to convince researchers there is a need to interlink information,” he said. While the traditional system for scientific publication has well-known and well-oiled work flows, the same is not true for data sets, which differ from one discipline to another. For Open Access to be sustainable, “We should develop ways to interrogate whatever data is generated,” Manghi said. This must apply both to the initial use of data, but also to allowing data to be re-used in the future. People collecting data know what they want it for. To re-use data it is necessary to understand this context.
The overall message from the panel was clear: current levels of generation and storage of data is unsustainable. To plot the path to a sustainable future requires a cooperative effort across Europe’s research infrastructures and e-infrastructures. It is critical not to re-invent the wheel – some answers are out there already point the way towards a sustainable future in which researchers are smarter at generating data and more selective in storing it.