Linking data: data.europa.eu
Publication Date/Time
2022-11-30T07:00:00+00:00
How linked open data standards help to connect public sector data in
Europe
In our ‘Linking data’ series, we present EU projects that
use linked open data
[https://data.europa.eu/en/datastories/linking-data-what-does-it-mean]
(LOD). You may be wondering, what data is linked in their projects?
Why did they decide to use LOD? What benefits does it bring? Follow
the series
[https://data.europa.eu/en/publications/datastories?keywords=&country=All&year=&sort_by=created&sort_order=DESC&items_per_page=10&keywords=%22Linking+data%3A+%22&merged-select=created&items_per_page=10]
to find out.

In this episode, we will take a closer look at data.europa.eu, the
official portal for European data.

 

CENTRAL POINT OF ACCESS TO PUBLIC SECTOR DATA IN EUROPE

In the EU, the public sector is one of the most data-intensive sectors
[https://digital-strategy.ec.europa.eu/en/policies/open-data]. Public
sector bodies produce, collect and pay for vast amounts of data, known
as public sector information or government data. If those government
data can be freely accessed and reused, we call them open data.

Reuse of public sector information can generate important value for
the economy and society. A prerequisite to make such reuse possible is
that it is easy to find and access the data. Therefore, the EU
institutions, together with national open data portals, set up and
manage data.europa.eu, the official portal for European open data.
[https://data.europa.eu/sites/default/files/img/media/DEU_graphics-11.png]
Data.europa.eu is a platform that provides a central point of access
to data made available by the public sector in Europe.

By providing metadata about and linking to data sources, the portal
currently gives access to open data resources from 36 European
countries, EU institutions and agencies, and international
organisations. It allows everyone to easily search, explore, link,
download and reuse the open public sector data for commercial or
non-commercial purposes.
[https://data.europa.eu/sites/default/files/img/media/DEU_graphics-12.png]
 

HOW IS THE DATA LINKED?

Metadata of datasets on data.europa.eu is encoded as Resource
Description Framework (RDF) triples, using a common application
profile of the Data Catalogue Vocabulary
[https://data.europa.eu/en/publications/datastories/linking-data-data-catalogue-vocabulary-application-profile]
(DCAT) web ontology (DCAT-AP). Triples are a
subject–property–object declaration where the subject is the item
being described, property or attribute is what is being described and
object is the statement being made. For example, a dataset title is
always encoded as the same type of property, whether it comes from a
local geo-data catalogue or Eurostat [https://ec.europa.eu/eurostat].
Furthermore, titles are made available in all 24 official EU languages
by machine translation, all using the same property.
[https://data.europa.eu/sites/default/files/img/media/DEU_map_diagram.png]
DCAT-AP uses specific controlled vocabularies, many of which are
maintained by EU reference data
[https://op.europa.eu/en/web/eu-vocabularies/data-catalogue]. Our
previous data story on EU vocabularies
[https://data.europa.eu/en/datastories/linking-data-eu-vocabularies]
shows how reference data promotes LOD. Using controlled vocabularies
ensures that datasets are described in a harmonised way, which, in
turn, helps to explore the datasets and find new connections between
them.

For example, data.europa.eu offers the possibility to browse all the
datasets by topic, according to 13 thematic categories. This is
implemented using a multilingual controlled vocabulary with 13 terms
(one per thematic category), which allows datasets from different
catalogues and in different languages to be automatically linked,
based on the selected thematic category.

How does it work? The term for each category is in fact a unique
resource identifier [https://op.europa.eu/en/web/webguide/uris] (URI)
taken from the EU vocabulary data theme
[https://op.europa.eu/web/eu-vocabularies/dataset/-/resource?uri=http://publications.europa.eu/resource/dataset/data-theme].
For example, the ‘Health’ category refers to the following URI:
https://publications.europa.eu/resource/authority/data-theme/HEAL
[https://publications.europa.eu/resource/authority/data-theme/HEAL],
whether the thematic category label of the dataset’s metadata says
‘Health’, ‘Υγεία’, ‘Gesundheit’ or
‘Egészségügy’.

The same is true for terms describing the language of a dataset, the
country where its publisher resides, the file format, etc.

Imagine wanting to find a large set of textual data, in two given
languages, to train an artificial intelligence application to create
automatic translations. The DGT translation memory dataset
[https://data.europa.eu/data/datasets/dgt-translation-memory?locale=en]
is one such example. Having terms for file types and languages
indicated as URIs means that this search can be executed automatically
(e.g. using SPARQL queries [https://www.w3.org/TR/rdf-sparql-query/]).

In fact, another principle of LOD is that all (meta)data should be
searchable via SPARQL. Data.europa.eu allows all RDF data to be
searched via the SPARQL endpoint [https://data.europa.eu/sparql].
SPARQL queries can also be submitted via the user interface
[https://data.europa.eu/data/sparql?locale=en] based on Yasgui
[https://triply.cc/docs/yasgui] (a query editor), making it easier for
humans to manually query the data. These pages also include references
to the standards used and sample queries.

Finally, thanks to DCAT, datasets on the portal can be intentionally
and directly linked to each other. For example, the data provider can
indicate in the dataset’s metadata that dataset X is a version of
dataset Y.

Future enhancements of the portal will be able to use artificial
intelligence to detect such relationships and declare them in a
standardised way.
[https://data.europa.eu/sites/default/files/img/media/image5_3.png]
THE BENEFITS OF THE LOD APPROACH

The most obvious benefit of the LOD approach is that data can ‘talk
to each other’ and that humans and machines can understand them in a
correct way. LOD also enables machine-to-machine access. It is
particularly important as it allows devices to exchange information
and perform actions without human assistance.

All this, in turn, allows everyone to reuse the vast amounts of
available data more easily to build apps, data visualisations or
combine data from different publishers and sources and create new
datasets for specific needs.

The more data-driven solutions and resources are out there, the easier
it is to communicate complex information, make informed decisions and
solve problems effectively.

 

_Useful links_

DCAT-AP
[https://joinup.ec.europa.eu/collection/semantic-interoperability-community-semic/dcat-ap]

Data.europa.eu SPARQL endpoint: query editor
[https://data.europa.eu/data/sparql?locale=en], machine access
[https://data.europa.eu/sparql]

Data.europa.eu application programming interface
[https://data.europa.eu/api/hub/search/]

Linked data solutions: pilot catalogue on EU knowledge graph
[https://query.linkedopendata.eu/index.html#PREFIX%20wdt%3A%20%3Chttps%3A%2F%2Flinkedopendata.eu%2Fprop%2Fdirect%2F%3E%0APREFIX%20wd%3A%20%3Chttps%3A%2F%2Flinkedopendata.eu%2Fentity%2F%3E%0A%0ASELECT%20DISTINCT%20%3Fsolution%20%3FsolutionLabel%20%3Fdescription%0A%0A%7B%3Fsolution%20wdt%3AP35%20wd%3AQ2839858%20.%0A%20%20%20%20OPTIONAL%20%7B%3Fsolution%20schema%3Adescription%20%3Fdescription%20%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20FILTER%20%28LANG%28%3Fdescription%29%20%3D%20%22en%22%29%7D%0A%0A%20%20%20%20SERVICE%20wikibase%3Alabel%20%7B%20bd%3AserviceParam%20wikibase%3Alanguage%20%22%5BAUTO_LANGUAGE%5D%2Cen%22.%20%7D%0A%20%20%20%20%7D%0A%0AORDER%20BY%20DESC%28%3Fdescription%29]

Graphics used in this article (available for reuse under CC-BY-4.0)
[https://gitlab.com/dataeuropa/data-provider-repository/-/tree/master/Data%20stories/Linked%20Open%20Data/LOD_DEU_graphics]
