Michael Lautenschlager is the Head of the Department for Data Management at the German Climate Computing Centre (DKRZ) in Hamburg. His department supports the whole data life cycle of climate model data, and their work includes a special focus on supporting virtual research environments, supporting both the editing process and quality control for the publication of climate data, and operating a long-term archive for climate data. This long-term archive has a certification as World Data Center for Climate (WDCC) for which Michael is the Director. Michael is also involved with the European Network for Earth System Modelling (ENES) where he is a member of the ENES Data Task Force and the Earth System Grid Federation (ESGF).
Hello Michael – would you start by telling us what ENES is?
Yes, let me answer that by taking a step back and explaining the why of ENES first. As we are well aware, the climate of this planet is changing in such a way as to have an impact on all of us who live on the Earth, and it will change more in the future. The Kyoto protocol has called the nations of the world to take remedial action to mitigate the impact of anthropogenic climate change (that is, change resulting from human activities), however there are various challenges to be overcome in motivating enough people to genuinely commit to such actions. To convince enough people of the dangers we face if we continue on as we have been, we desperately need improved assessments of climate change and its societal impact, particularly at the regional level. However, to obtain reliable information with that level of specificity, we require an improved understanding of the climate system and of its interactions with our socio-economic systems.
Earth system models are the only analytical tools we have available for predicting how our climate will evolve – whether under “natural” conditions or with the influence of humankind. The development and use of realistic climate models requires a sophisticated software infrastructure, along with access to the most powerful supercomputers and data handling systems.
The European Network for Earth System Modelling (ENES) is a network of scientific institutions, universities, governmental organisations and industrial partners in Europe that have developed world-class expertise in different aspects of Earth system modelling. The ENES partners have come together to help in the development, evaluation and improvement of state-of-the-art climate and earth system models, and to work together to establish a European infrastructure of high performance computing (HPC) facilities dedicated to high-resolution Earth system models – particularly in ensembles where multiple Earth system models are integrated with each other.
ENES has approximately 50 partners in terms of academia, industry and research organisations, as well as working with climate service institutes and research communities from various academic disciplines (such as Oceanography, Meteorology, Glaciology and Geochemistry, to name but a few).
ENES was established in 2001, so, in terms of information technology (IT), it is a very mature community. However, due to the highly complex nature of the scientific questions that we are working on and the methods used by ENES and its partners in their long history, it is obvious that the IT infrastructures we use are being continually pushed to their limits. This holds for HPC, network infrastructures and data management in general, and has thus lead to several IT projects (like the Programme for Integrated Earth System Modelling, or PRISM, and more recently the Infrastructure for the European Network for Earth System Modelling, or IS-ENES). The collaboration goes beyond national borders and is not limited to Europe: the infrastructure that is currently used the most is the Earth System Grid Federation (ESGF) which is the result of a well established cooperation with US partners.
Thanks for that background on ENES, Michael. In what ways is ENES using or considering using EUDAT services?
As you can imagine, climate data products are of interest in a wide range of scientific disciplines these days. Therefore there is a substantial interest in ENES making its data available in a structured interdisciplinary context so there is a mechanism whereby other researchers can search for and find our data. We are already using B2FIND to integrate metadata from the World Data Center Climate (WDCC), which includes some Earth System Grid Federation (ESGF) data. We have an ongoing project in which we are integrating more of the ENES metadata providers so their data can be found via B2FIND.
Within ENES, several partners have already established infrastructures for the long-term preservation of data, especially well structured large data collections. However we also need to preserve long tail data (that is, large numbers of small data collections) within ENES. DKRZ, where I work, is interested in running an instance of B2SHARE for ENES users and we will be evaluating the situation regarding the potential uptake of this service during the autumn.
In terms of the other services, the Centre for Environmental Data Archival (CEDA) could be joined into B2SAFE to replicate data across sites for load balancing and security. If we go ahead with this, the Science and Technology Facilities Council (STFC) would be the ENES partner taking part in that collaboration.
CEDA actually has two main sites, and is interested in moving many and large data objects between those sites, so, once the CEDA data is archived in B2SAFE, we will be in a position to evaluate whether such bulk movement to HPC sites makes sense for ENES. Should we proceed, the primary candidate for this would be the STFC’s JASMIN facility, which is a "super-data-cluster" (half supercomputer centre and half data centre) that provides an infrastructure for data analysis.
I understand that ENES is also considering some of the newer EUDAT services that are under discussion and in the development pipeline at present?
Yes, ENES is in the process of implementing persistent identifier (PID) support in its major e-infrastructure, so we need a conceptual framework for PID-related management workflows and services. EUDAT should be developing tools to accommodate changes in the location, ownership and status of data on a massive scale, as well as broadening the acceptance of PID services by showing how useful they are to all the people who work with data collections. In particular, EUDAT needs to develop exemplary end-user services to highlight this added value. As soon as those tools and services are available, you can be sure that ENES partners will adapt and integrate them into their environments.
We are also interested in the Dynamic Data/Workflow and Semantic Annotation services that EUDAT is discussing, however it is not yet clear when these services may be available, so ENES does not yet have any definite plans for adopting these within the next few years.
You mentioned the ongoing project integrating more of the ENES metadata providers with B2FIND, and also DKRZ working with B2SHARE. Please tell us more about the challenges that you are facing in these projects?
Integrating established workflows with new components is not always a straightforward matter. Especially in the metadata area, it is absolutely vital to have a deep understanding of the models and schemas that are being used. For B2SHARE, there are not only technological questions that have to be answered, but also trust issues that have to be addressed. So, for both the ENES-B2FIND and DKRZ-B2SHARE projects, we are taking the opportunity to address all the necessary aspects in a schedule that allows sufficient time to resolve all the issues.
So essentially ENES is forging ahead at a safe speed with the uptake of the currently relevant and appropriate EUDAT services?
Yes, indeed, and we really appreciate the chance that EUDAT offers to learn about how other research communities address the IT challenges ahead. Cooperation in wider contexts is beneficial for all of us.