Documentation about the usage of the EUDAT metadata service B2FIND.
Modified: 25 October 2017
Scientific work largely depends on the availability of published research data for further analysis and re-use. Therefore, the process of finding and easily retrieving relevant data and their metadata has become increasingly important. This demands user-friendly technologies and practices for searching and retrieving data collections within an overarching framework of inter-disciplinary and international archives.
EUDAT provides the B2FIND service which allows users to find data collections within a pan-European and inter-disciplinary scope. In order to achieve this, EUDAT has developed a comprehensive metadata catalogue and a user-friendly discovery Portal, which exposes the CKAN functionality and API.
The B2FIND repository collects diverse metadata from heterogeneous sources inside EUDAT and presents them in a consistent form. The homogenisation of community-specific data models and vocabularies enables not only the unique presentation of these datasets as tables of field-value pairs, but also faceted search in the B2FIND metadata portal or via an easy-to-use command line tool.
The service provides advanced search functionalities, such as filtering by location and time; or selecting e.g. a specific author or discipline. Furthermore, the service provides transparent access to the scientific data objects through the given references in the retrieved datasets.
How to find and access data using B2FIND
Figure 1. The B2FIND metadata search service.
From the entry page (see Figure 1) you can list all communities that provide metadata to B2FIND by clicking 'COMMUNITIES'. By entering text in the provided free text field 'Search Your Data', you can perform a full text search over the whole catalogue directly from the entry page. This leads you to a result page where all retrieved datasets are listed on the panel on the right.
Figure 2. The communities integrated with B2FIND.
By clicking the 'Faceted Search' button, you get directly to a page with all datasets of the meatadata catalogue listed (see Figure 3). From here you can use more advanced search functionalities provided in the left navigation bar.
Figure 3. Metadata search results page.
There are several ways to find metadata records in the service, and powerful options to browse and filter search results are provided. These include the following:
- Free text search over the full text body ('Search Your Data', Figure 1)
- Faceted search, i.e. selecting values of a metadata field or property ('Faceted Search')
- Filter by location or time, i.e. search for all datasets, that cover a chosen region or time period (visible on search results page Figure 3)
These search requests can be combined and executed in one go by using the advanced search options.
Most of the navigation and filter functionalities are self-explanatory, but an extensive search guide is available by clicking the question mark in the 'Faceted Search' page (Figure 3) or directly from the B2FIND search guide.
By clicking on one of the listed datasets the metadata are displayed (see Figure 4).
Figure 4. Example of data set view.
The page has the following format:
- Spatial extent is displayed in the top left corner
- Title and description are shown in the upper range
- All mapped textual B2FIND fields and their values are displayed in the "Additional Info" table. Among them, the following two references provide access to data resources:
- Source: link to the data object the metadata refer to
- MetaDataAccess: link to the original harvested metadata XML record (via the OAI GetRecord request)
Access data resources
The link given in the 'Source' field links to the URL of the data resource behind the metadata. In some cases this leads to the data resource itself or to a "landing page", or it may not be resolvable, e.g. if it is only machine-readable.
Figure 5. Example of data access via the link given in the "Source" field.
The B2FIND Portal is based on the CKAN platform. B2FIND exposes the full CKAN API, version 2.2. For full documentation see the CKAN website. EUDAT has built a client tool using this API, see below. Feel free to use the B2FIND API to develop your own clients to access the B2FIND service.
The Python script
searchB2FIND.py - available from the B2FIND git repository - uses the B2FIND API to allow submission of search requests directly from the command line.
The usage of the script is shown by calling
./searchB2FIND.py -h, as follows:
usage: searchB2FIND.py [-h] [--ckan IP/URL] [--output STRING] [--community STRING] [--ids [IDS [IDS ...]]] [PATTERN [PATTERN ...]]
Description: List identifiers of datasets that fulfill given search criteria
PATTERN B2FIND search pattern, i.e. (a list of) field:value terms.
-h, --help show this help message and exit
--ckan IP/URL CKAN portal address, to which search requests are submitted (default is b2find.eudat.eu)
--output STRING, -o STRING
Output file name and format. Format is given by the extention, supported are 'txt' (plain ascii file) or 'hd5' file.
--community STRING, -c STRING
Community where you want to search in
--ids [IDS [IDS ...]], -i [IDS [IDS ...]]
Identifiers of found records outputed. Default is 'id'. Additional 'Source','PID' and 'DOI' are supported.
1. >./searchB2FIND.py -c aleph tags:LEP
searchs for all datasets of community ALEPH with tag "LEP" in b2find.eudat.eu.
2. >./searchB2FIND.py author:"Jones*" AND Discipline:"Crystal?Structure" --ckan eudat-b1.dkrz.de
searchs in eudat-b1.dkrz.de for all datasets having an author starting with "Jones" and belongs to the discipline "Crystal Structure"
3. >./searchB2FIND.py -c narcis DOI:'*' --ids DOI
returns the list of id's and DOI's for all records in community "NARCIS" that have a DOI
You can find B2FIND training presentations on the EUDAT website.
You can also find hands-on training material on B2FIND on our github repository; section 5 deals with using the service.
Support for B2FIND is available via the EUDAT ticketing system through the webform.
If you have comments on this page, please submit them though the EUDAT ticketing system.
Heinrich Widmann (DKRZ)
Hans van Piggelen (SURF)
Sara Ramezani (SURF)
Kostas Kavoussanakis (EPCC)