Sustainability of (Open) Data Portals Infrastructures reports pt. 3
Publication Date/Time
2020-09-01T07:00:00+00:00
This is the third piece in a series about the “Sustainability of
(Open) Data Portal Infrastructures” reports. In this highlight, the
focus is on “Data Reuse: A Method for Transforming Principles into
Practice”
Over the next few months, all six reports included in the
”Sustainability of (Open) Data Portals Infrastructure”
[/en/impact-studies/studies?section=7&source=EDP] will be summarised
using featured highlights. This particular article will focus on the
third report: “Data Reuse: A Method for Transforming Principles into
Practice”
[/sites/default/files/sustainability-data-portal-infrastructure_3_dataset-reuse.pdf].
This report discusses a new approach to create an automated assessment
of the re-use of data. The report elaborates on an example to guide
portals through crucial aspects for an automated assessment of data
re-use and increase engagement from customers.

THE CHALLENGE OF AN AUTOMATED ASSESSMENT OF DATA RE-USE

An increasing amount of data is published openly on the web with the
aim to foster re-use. Despite numerous efforts, portal owners and data
publishers do not measure re-use routinely. Nevertheless, data
re-usability is stated as one of the four FAIR principles – a
compilation of high-level best practices for making data findable,
accessible, interoperable and re-usable. While the FAIR metrics
provides exemplary metrics for the FAIR principles, measuring FAIRness
is not an established practice. There are a variety of best practices
and guidelines (thoroughly explained in the report) detailing data
sharing and re-use principles. However, the automated assessment of
re-use remains a substantial challenge.

The first part of the report “Measuring Use and Impacts of Portals
[/en/highlights/sustainability-open-data-portals-infrastructures-reports-pt-1]”
suggests several solutions to track  and assess data re-use
automatically, including pixel tracking, dataset citations and
enforcing log-ins. However, these methods all have their own set of
limitations. Thus, it is vital to address an alternative assessment
approach that focuses more on the re-use side of open data than the
publishing side, including automation support. This third part of the
report presents such an approach and introduces a method that helps
portal owners understand what makes a dataset re-usable, using
engagement data they can track themselves.

METHOD & RESULTS

The method consists of the following steps, to be carried out by teams
managing open data portals:

 	* SCOPE the assessment exercise.
 	* DEFINE RE-USE METRICS. These depend on the capabilities of your
portal and the underlying technical infrastructure.
 	* COLLECT REUSE METRICS (or proxies). For this, you need technical
capabilities which may be built into the publishing software being
used, or aggregated metrics derived from lower-level system logs.
 	* DEFINE REUSE INDICATORS. These need to be measurable and will be
used as features in the prediction model.
 	* ANALYSE THEIR DISTRIBUTION FOR THE TOP-REUSED GROUP OF DATASETS.
 	* USE A COMBINATION OF THOSE FEATURES TO BUILD A STATISTICAL MODEL
to predict re-usability.
 	* DERIVE RECOMMENDATIONS to datasets and publishing processes.

In the report, an extensive example is provided on how to apply the
method, showing that it is possible to identify a basket of engagement
metrics and predict the re-usability of a dataset based on attributes
such as its structure, the way it was published and its documentation.
In addition to the example, the report provides recommendations for
portal owners to augment their publishing and portal design practice
to support and enhance those features of a dataset that are
quantifiably linked to higher engagement from users.

Even with current technologies, this approach can be valuable to
inform:

 	* System designers on building functionalities to capture
information automatically.
 	* Publishers in supplying certain information as metadata.
 	* User experience designers on how to build the interaction process
between datasets re-users and the interface of a data portal.
 	* Portal owners on their portal development.
 	* Open data users in the wider ecosystem to help them identify the
datasets that may be most useful to work with.

As stated, this article focused on a few key findings of the report.
For more information on developing a method for automated assessment
of your data reuse, explore the full report “Data Re-use: A Method
for Transforming Principles into Practice
[/sites/default/files/sustainability-data-portal-infrastructure_3_dataset-reuse.pdf]”
on the EDP website. Moreover, keep an eye out for our next the EDP
team’s fourth featured highlight on 30 SEPTEMBER 2020 that will
focus on “Funding Portals: A Business Case Approach to Funding Model
Longevity
[/sites/default/files/sustainability-data-portal-infrastructure_4_funding-portals.pdf]”.

For more information or examples on open data, explore the European
Data Portal’s (EDP) news archive
[/en/news-events/news] and featured highlight section
[/en/news-events/news?type=highlights]. Aware of open data examples or
stories?  Share them with us via mail [/en/feedback/form?type=4],
and follow us on Twitter
[https://twitter.com/EU_DataPortal], Facebook
[http://www.facebook.com/EuropeanDataPortal] or LinkedIn
[https://www.linkedin.com/company/10478056/] to stay up to date!
