Linking data: European Science Vocabulary
Publication Date/Time
2022-09-14T09:37:52+00:00
Discover an openly accessible tool to classify scientific fields
_In our ‘Linking data’ series, we are presenting EU projects that
use linked open data (LOD)
[https://data.europa.eu/en/datastories/linking-data-what-does-it-mean].
What data is linked in these projects? Why did they decide to use LOD?
What benefits does it bring? Follow the series to find out.
[https://data.europa.eu/en/publications/datastories?keywords=&country=All&year=&sort_by=created&sort_order=DESC&items_per_page=10&keywords=%22Linking+data%3A+%22&merged-select=created&items_per_page=10]_

In this episode, we are presenting the European Science Vocabulary.
Read on to find out what it is and how and why it uses LOD.  

 

CORDIS – EU RESEARCH AND DEVELOPMENT DATABASE

The Community Research and Development Information Service (CORDIS)
[https://cordis.europa.eu/] is a multilingual platform offering access
to data about EU-funded research and innovation projects. Its mission
is to bring research results to professionals in the field, foster
open science, create innovative products and services and stimulate
economic and scientific growth across Europe. 

The platform is made up of several databases and contains information
on all EU-supported research and innovation (R & I) activities,
including funding programmes (such as Horizon 2020
[https://ec.europa.eu/info/research-and-innovation/funding/funding-opportunities/funding-programmes-and-open-calls/horizon-2020_en]),
projects, results and publications. 

The project database is at the heart of CORDIS. It gives access to
public information about EU-funded R & I projects, including
details such as objectives, dates and funding programmes.

 
[https://data.europa.eu/sites/default/files/img/media/image1_3.png]
 

IMPROVING FINDABILITY WITH EUROSCIVOC 

To make projects easier to find in the database, CORDIS developed the
European Science Vocabulary
[https://op.europa.eu/en/web/eu-vocabularies/at-dataset/-/resource/dataset/euroscivoc]
(EuroSciVoc). This is a taxonomy
[https://op.europa.eu/en/web/eu-vocabularies/taxonomies] – a way
of describing data in which all the terms belong to a single
hierarchical structure and have parent/child or broader/narrower
relationships to other terms. The structure is sometimes referred to
as a ‘tree’. EuroSciVoc allows the classification of projects
according to the precise scientific field(s) which they relate to. 

EuroSciVoc’s root is based on the two levels of the Fields of
Research and Development classification
[https://www.oecd-ilibrary.org/science-and-technology/frascati-manual-2015_9789264239012-en],
developed by the Organisation for Economic Co-operation and
Development. To offer a comprehensive categorisation, the taxonomy
tree was further enriched with additional branches, based on the
scientific fields collected from the abstracts of projects stored on
the CORDIS platform.

 
[https://data.europa.eu/sites/default/files/img/media/image2_4.png]
 

The taxonomy contains more than 1 000 categories available in six
languages (English, French, German, Italian, Polish and Spanish).
Starting from its seven root categories, the EuroSciVoc classification
can reach a maximum depth of six levels.

 
[https://data.europa.eu/sites/default/files/img/media/image3_2.png]
 

Each category is enriched with one or more relevant keywords, in other
words alternative related terms used to classify projects in addition
to their main term. The keywords are extracted from the textual
description of the projects. For instance, the keywords for the
category ‘water supply systems’ are ‘water supply network’ and
‘water supply infrastructure’. Stop-keywords can be used to
exclude certain projects from categories (e.g. exclude a project
mentioning ‘state of the art’ from the category ‘arts’). 

The major benefit of EuroSciVoc is that it allows users to find
projects belonging to specific domains of science in a standardised
way.

 
[https://data.europa.eu/sites/default/files/img/media/image4_1.png]
 

EUROSCIVOC LIFECYCLE 

EuroSciVoc follows a pragmatic approach of combining artificial
intelligence and human expertise. Artificial intelligence algorithms
are used to extract and suggest categories and their keywords from
project descriptions. Those suggestions are then validated by
humans. 

As the project database expands, EuroSciVoc evolves and is maintained
constantly. Its maintenance and update workflow consists of four
phases. 

 	*
A dedicated tool using a combination of natural language processing
algorithms helps to classify CORDIS projects according to the
EuroSciVoc taxonomy. 

 	*
Dedicated algorithms detect all new categories issued and all
modifications of existing categories. They compile a list that is
crucial for the cleansing phase. 

 	*
During cleansing, the EuroSciVoc team analyses the list and either
applies the necessary modifications directly or following a
discussion. 

 	*
Finally, once the list has been verified and changes have been made,
EuroSciVoc is released. It is made available on the EU Vocabularies
website [https://op.europa.eu/en/web/eu-vocabularies] and in the
integrated CORDIS architecture (the repository of CORDIS content). 

 
[https://data.europa.eu/sites/default/files/img/media/image5_1.png]
 

LINKED OPEN DATA APPROACH 

EuroSciVoc is formalised using the Simple Knowledge Organization
System [https://www.w3.org/2004/02/skos/], a common data model for
sharing and linking knowledge organisation systems via the web. 

The main benefit of using this data model is that it allows to link
and align concepts and their labels between different controlled
vocabularies. Thanks to these connections, users can compare different
resources that are classified using equivalent categories,
irrespective of lexical and semantic differences.

Thanks to LOD, EuroSciVoc can be seamlessly reused by any other
organisation to classify their own data. As part of the EU controlled
vocabularies, EuroSciVoc is periodically published on the EU
Vocabularies website
[https://op.europa.eu/en/web/eu-vocabularies/at-dataset/-/resource/dataset/euroscivoc]
with a persistent uniform resource identifier
[https://op.europa.eu/en/web/webguide/uris]. The taxonomy is free for
reuse in accordance with the CC BY 4.0 license. 

 
[https://data.europa.eu/sites/default/files/img/media/image7.png]
 

REUSING EUROSCIVOC IN YOUR PROJECTS 

EuroSciVoc allows you to classify your data using a taxonomy built on
a corpus of the textual resources of more than 5 000 R & I
projects. It is frequently updated to accommodate new information and
improve its accuracy and scope. Its evolution, while data-driven, is
controlled by human experts, which ensures that it is semantically
consistent. 

Reusability and flexibility are some of the major features of
EuroSciVoc. The taxonomy can be used to represent fields of science in
six languages and can easily be adapted to other controlled
vocabularies using the Fields of Research and Development
classification (since the latter acts as its root). Finally,
EuroSciVoc is easily reusable, thanks to its formalisation in the
Simple Knowledge Organization System
[https://www.w3.org/2004/02/skos/]. 

 

LOOKING FORWARD 

The taxonomy is currently at version 1.3 but, as mentioned above, it
is an ongoing project. Apart from expanding and aligning with other
relevant taxonomies, EuroSciVoc will potentially evolve into a
thesaurus [https://op.europa.eu/en/web/eu-vocabularies/thesauri] –
a controlled vocabulary with concepts represented by labels, which
extends taxonomies’ hierarchical structure with associative
properties. 

What does this mean in practice? What’s the added value?  

Thesauri can determine that a concept is more or less specific than
another, but also that a concept is related to another because they
cover aspects of a similar domain. For instance, ‘artificial
intelligence’ is related to 'computational fluid dynamics' since the
latter studies fluid dynamics by exploiting techniques like machine
learning. 

If EuroSciVoc evolves into a thesaurus, it will provide CORDIS users
with more complex and exhaustive information. It will also benefit the
re-users of EuroSciVoc, as they will be provided with a more extensive
reference data asset. 

 

_Useful links_

European Science Vocabulary (EuroSciVoc)
[https://op.europa.eu/en/web/eu-vocabularies/euroscivoc]

Community Research and Development Information Service (CORDIS)
[https://cordis.europa.eu/]

Graphics used in this article
[https://gitlab.com/dataeuropa/data-provider-repository/-/tree/master/Data%20stories/Linked%20Open%20Data/LOD_CORDIS_graphics/EuroSciVoc_graphics]
(available for reuse under CC-BY-4.0)
