CompBioMed is a European commission H2020 funded Centre of Excellence (CoE) focused on the use and development of computational methods for biomedical application. The CompBioMed CoE seeks to exploit the third pillar of science in order to render predictive models of health and disease more relevant to clinical practice by providing a personalized aspect to treatment. The data-intensive workflows and distributed international partners involved in the project urges the use of proper data management solutions for handling the data.
The Scientific Challenge
One of the major challenges in the biomedical community is its ever-increasing demand for storing more data as well as the transfer, management and longer-term preservation of this data. Frequently, large data sets need to be moved closer to High Performance Computing (HPC) services prior to performing computational work. Once the computational work is done, the resulting data is then moved to somewhere else or kept closer to the HPC services for post processing work. This use case addresses the need for safe data replication and large data transfer within a system that can support a FAIR data cycle, an important data requirement within this international community.
Who benefits and how?
One major impact for the community is alignment with international standards and approaches for FAIR data management. The platform we are building, promotes FAIR data by enabling sharing and reuse of the data within the community. Generated data will be stored and archived for long-term preservation and published in open data repositories to ensure findability and reusability of the data. This will benefit the CompBioMed researchers and the wider community.
Within the community we are working to promote EUDAT services for data management. We are implementing two workflows for data replication and data publication.
For data replication workflow and long-term preservation of the data we are using the EUDAT B2SAFE and B2HANDLE services. We plan to make a federation of the High Performance Computing (HPC) centers involved in the pilot (BSC, SURF and LRZ) in order to enable share and exchange of large data among institutions that are using those HPC facilities.
In the data publication workflow, the aim is to provide the possibility to publish the results of simulations or final data in an open data repository to be findable and accessible by the wider community in long-term. an own instance of B2SHARE will be deployed in UCL. The data generated in CompBioMed will be published in the B2SHARE instance being deployed at UCL.
We plan to make a B2SAFE federation of the HPC centers involved in CompBioMed to be able to transfer and replicate large data and store data for long-term preservation. SURF (Netherlands) run B2SAFE as a service in production and connected to tape archive in the backend for long-term preservation. This instance is also federated with BSAFE instance at LRZ (Germany). We are also deploying B2SAFE at BSC (Spain) and plan to federate that with the other centers.
We are also in the process of deploying B2SHARE data repository at UCL. A technical design and first setup is identified.