This is the sixth and final piece in a series about the “Sus-tainability of (Open) Data Portal Infrastructures” reports. In this highlight, the focus is on “A Distributed Version Control Approach to Creating Portals for Reuse”
Since the summer of 2020, the European Data Portal (EDP) team has been summarising the six reports included in the “Sustainability of (Open) Data Portal Infrastructure” as featured highlights. This particular report will focus on the sixth report “A Distributed Version Control Approach to Creating Portals for Re-use”.
This report was prepared by the University of Southampton, and delves into the two most common issues open data portals face and provides suggestions for how to overcome them.
From their analysis of the European Data Portal traffic in 2019, evidence suggests that nearly half of the users reached the portal via a commercial search engine. This creates a two-fold problem. Firstly, competition for traffic arises among sources, portals and meta-portals. Secondly, if users depend on the dataset search functionality of big suppliers to find datasets, the discovery dimension of portals is at risk of becoming obsolete. It is in this co-location of tools as well as the promotion of data re-use where current portals struggle the most.
To overcome these difficulties, portals need to move forward in satisfying information needs beyond merely finding a specific dataset, and strengthening the co-location dimension of portals. To do so, the authors suggest community dataspaces (CDSs) as environments that facilitate collaboration and engagement between re-users.
Community dataspaces are “virtual environments that co-locate technical and social tools that can be used to create communities around single or related datasets, share or co-develop derived datasets and source code of re-uses and establish links with other datasets or other communities”. These spaces can be built around datasets from certain themes or combine datasets from various sources with re-use cases with the aim of findings the commonalities among them. Community members can host these environments as open-source packages.
Community dataspaces are viable solution to increase re-use and carve out the unique benefit of data portals as they:
- Allow communities to discover and fix errors in the datasets, thereby increasing the quality of the data but also to find the connections between datasets held by different organisations.
- Help create a network effect where the outcomes of the community’s work, processes, and discussion are stored on the dataset for the benefit of all, ultimately attracting a wider audience and increasing engagement.
- Offer a more advanced set of tools to data owners, supporting their expertise and ability to handle the interaction with re-users, as well as fostering the integration (or rejection) of changes and derived datasets suggested by communities.
A commonly used Open Source Software community built on this idea is a Distributed Version Control (DVC) system, of which a prototype is suggested in the report. This is an advantageous resource that empowers re-users to develop and share work, improve the quality of datasets, and increase measurability and search capabilities.
For more information, explore the full report on the EDP website.
This report closes the series on the sustainability of open data portal infrastructures. Find all reports of the series on the EDP website. Soon, you will find this content on the new portal: data.europa.eu. Stay tuned!