Sustainability of (Open) Data Portal Infrastruc-tures reports pt. 6
Publication Date/Time
2021-04-14T07:00:00+00:00
Country
Europe
This is the sixth and final piece in a series about the
“Sus-tainability of (Open) Data Portal Infrastructures” reports.
In this highlight, the focus is on “A Distributed Version Control
Approach to Creating Portals for Reuse”
Since the summer of 2020, the European Data Portal (EDP) team has been
summarising the six reports included in the “Sustainability of
(Open) Data Portal Infrastructure” as featured highlights. This
particular report will focus on the sixth report “A Distributed
Version Control Approach to Creating Portals for Re-use
[/sites/default/files/sustainability-data-portal-infrastructure_6_distributed-version-control.pdf]”.

This report was prepared by the University of Southampton
[https://www.southampton.ac.uk/], and delves into the two most common
issues open data portals face and provides suggestions for how to
overcome them.

From their analysis of the European Data Portal traffic in 2019,
evidence suggests  that nearly half of the users reached the portal
via a commercial search engine. This creates a two-fold problem.
Firstly, competition for traffic arises among sources, portals and
meta-portals. Secondly, if users depend on the dataset search
functionality of big suppliers to find datasets, the discovery
dimension of portals is at risk of becoming obsolete. It is in this
co-location of tools as well as the promotion of data re-use where
current portals struggle the most.

To overcome these difficulties, portals need to move forward in
satisfying information needs beyond merely finding a specific dataset,
and strengthening the  co-location dimension of portals. To do so,
the authors suggest community dataspaces (CDSs) as environments that
facilitate collaboration and engagement between re-users.

Community dataspaces are “virtual environments that co-locate
technical and social tools that can be used to create communities
around single or related datasets, share or co-develop derived
datasets and source code of re-uses and establish links with other
datasets or other communities”. These spaces can be built around
datasets from certain themes or combine datasets from various sources
with re-use cases with the aim of findings the commonalities among
them. Community members can host these environments as open-source
packages. 

Community dataspaces are viable solution to increase re-use and carve
out the unique benefit of data portals as they:

 	* Allow communities to discover and fix errors in the datasets,
thereby increasing the quality of the data but also to find the
connections between datasets held by different organisations.
 	* Help create a network effect where the outcomes of the
community’s work, processes, and discussion are stored on the
dataset for the benefit of all, ultimately attracting a wider audience
and increasing engagement.
 	* Offer a more advanced set of tools to data owners, supporting
their expertise and ability to handle the interaction with re-users,
as well as fostering the integration (or rejection) of changes and
derived datasets suggested by communities.

A commonly used Open Source Software community built on this idea is a
Distributed Version Control (DVC) system, of which a prototype is
suggested in the report. This is an advantageous resource that
empowers re-users to develop and share work, improve the quality of
datasets, and increase measurability and search capabilities.

For more information, explore the full report on the EDP website
[/sites/default/files/sustainability-data-portal-infrastructure_6_distributed-version-control.pdf].

This report closes the series on the sustainability of open data
portal infrastructures. Find all reports of the series on the EDP
website
[/en/impact-studies/studies?keywords=&section=7&source=EDP&country=All&year=&items_per_page=10&page=1].
Soon, you will find this content on the new portal: data.europa.eu.
Stay tuned!
