What is the issue?
Well-described and documented scientific workflows that can be executed to achieve new results are becoming more and more important in all scientific disciplines to cope with the increasing amount of data in appropriate ways and to increase the reproducibility of scientific results. This is true both for raw data generated by sensors and software systems and processed in regular ways, and also for many areas of derived data - the long-tail data. As we move towards “data fabric solutions”, workflow support for manipulating data will be essential. Data infrastructure initiatives such as EUDAT and DataONE are already working on workflow systems and building up expertise, while large institutions such as LANL and SDSC are also looking into such workflow systems to offer services for scientists. It is not yet fully clear which environment will be offered in these cases or exactly what types of services is data infrastructures should offer. EUDAT will continue to work with community experts to test service concepts that allow users to execute workflows on data stored in the EUDAT data domain.
What objectives have been set?
Based on inputs from experts in different fields, a set of objectives have been defined together with a 2 phase working plan. For the upcoming period the working group will particularly focus on:
- The work on providing EUDAT Service APIs for use within Workflows
- Based upon PNNL WPFS, explore solutions for EUDAT Workflow Provenance Service(s) based upon consolidated experiences
- Design and implement a workflow repository and registry service in which communities can provide content about workflow execution engines
What are the achievements to-date?
The WG is still at the discussion phase, so the current work is still related to discussions on issues with respect to workflows.
Highlights of the Workflows session at the 3rd EUDAT Conference, Amsterdam September 2014 include:
- Legal aspects handled in a presentation from Pavel Kamoski. Explaining about the data privacy concept. Surprisingly it seems anonymisation of data is in itself already a possible illegal action
- We were told about the workflow plans in EUDAT 2020 WP8 especially in relation to Dynamic Data by Erhard Hinrichs
- Emanual Dima had a presentation explaining about the need and possibilities to shield the GEF from the deployment environment and use of a software called “Docker”
- Yann le Franc talked about aggregating data from multiple neuro-science repositories using semantic web-technologies
Workflows Session at EUDAT 2nd Conference: The new services track at the 2nd EUDAT Conference in Rome took forward the Workflows discussion. Workflows are a joint research activity in EUDAT in which communities (e.g. ENES and CLARIN) are assessing solutions in which community workflows can make use of the EUDAT services. The results from the working group workshop were presented followed by the ENES and CLARIN use cases on workflows and a proposal for a generic execution framework (GEF).
Barcelona Workshop: The goal of the EUDAT Workshop on Workflows was to understand the needs of the community experts on common services, how to orchestrate data processing and how scientific workflows can make use of EUDAT services. Support for workflow provenance and services to register and describe workflow components and make them discoverable, referable (e.g. assigning PIDs to components) and to capture best practices were intensively discussed by the 20 international experts in the field of “Scientific workflows” present. They shared their insights and experiences towards the need of common services that EUDAT might be able to provide and concluded that it is very important to describe the functionality of a workflow component, input and output data formats and test data to certify the functionality of a component. The consensus of workshop participants on potential ‘common workflow services elements’ are reflected in four recommendations and corresponding actions for EUDAT that require more elaborate exploration.
How does the WG work?
Regular teleconferences are taking place to ensure continuity and to update the workplan. Side-events are organized at several EUDAT workshops and conferences
Who is involved?
The Workflows Working Group has currently 84 members and is co-chaired by:
- Christian Pagé, CERFACS, France - ENES
- Erhard Hinrichs, Tübingen University, Germany - CLARIN
- Reinhard Budich, MPI-M, Germany - ENES
- Morris Riedel, Juelich, Germany
Useful Links / Documents
Workflows session at the 3rd EUDAT Conference (Sept 2014)
Barcelona Workflows Workshop (Sept 2013)