Data homogenisation guidelines for high-value datasets
In today's data-driven landscape, the harmonisation of high-value
datasets stands as a pivotal strategy, stimulating data usability and
efficiency. Homogenisation, the process of standardising and aligning
diverse datasets, is a fundamental to improving data consistency and
interoperability across various domains. According to the Open Data
Directive
[https://eur-lex.europa.eu/legal-content/EN/TXT/?qid=1561563110433&uri=CELEX:32019L1024],
high-value datasets (HVDs) are explicitly associated with significant
benefits for society and the economy.  

This approach, encompassing the standardisation of formats and quality
enhancement techniques, fosters seamless integration, sharing, and
analysis among diverse stakeholders – from companies, entrepreneurs,
and researchers to policymakers and citizens. The resulting
standardised datasets not only fuel innovation but also pave the way
for collaborative efforts and informed decision-making across sectors
such as healthcare, finance, environmental sciences, and more.

The ‘Report on Data Homogenisation for High-value Datasets’
[https://data.europa.eu/en/doc/report-data-homogenisation-high-value-datasets]
offers a methodological blueprint, delving into challenges and
recommendations crucial for understanding the concept and significance
of HVDs. It navigates the intricate landscape of homogenisation
between existing standards and DCAT-AP
[https://semiceu.github.io/DCAT-AP/releases/3.0.0/], emphasising the
necessity of recognised licenses, application programming interfaces
(APIs), and the adoption of controlled vocabularies and ontologies to
describe datasets effectively.

In proposing methodologies for both data and metadata homogenisation,
the report suggests using established data specifications from INSPIRE
[https://knowledge-base.inspire.ec.europa.eu/index_en], Eurostat
classifications for statistics
[https://op.europa.eu/en/web/eu-vocabularies/eurostat], and SEMIC Core
Business Vocabulary
[https://joinup.ec.europa.eu/collection/registered-organization-vocabulary/solution/core-business-vocabulary]
for companies and company ownership. It highlights the importance of
conferring controlled vocabularies
[https://op.europa.eu/en/web/eu-vocabularies/controlled-vocabularies]
available via the Publications Office of the European Union
[https://op.europa.eu/en/home] for comprehensive standardisation.
Additionally, improving API interoperability for bulk downloads and
adhering to specific licensing terms like Creative Commons BY 4.0
[https://creativecommons.org/share-your-work/cclicenses/] are
recommended for better accessibility and usage.

The path to data homogenisation faces various challenges stemming from
the diverse origins and formats of datasets. For those aiming to take
a leading role in this journey, the full report
[https://data.europa.eu/en/doc/report-data-homogenisation-high-value-datasets]
provides an insightful guide to navigating the complexities of
homogenising high-value datasets.

For more news and events, follow us on Twitter
[https://twitter.com/EU_opendata], Facebook
[https://www.facebook.com/data.europa.eu] and LinkedIn
[https://www.linkedin.com/company/publications-office-of-the-european-union/],
or subscribe to our newsletter
[https://data.europa.eu/en/newsletter].

Publication Date/Time
2023-12-15T09:00:00+00:00
Read the latest report on data homogenisation for high-value datasets
to ensure interoperability and consistency