Data homogenisation guidelines for high-value datasets

15 December 2023

Read the latest report on data homogenisation for high-value datasets to ensure interoperability and consistency

In today's data-driven landscape, the harmonisation of high-value datasets stands as a pivotal strategy, stimulating data usability and efficiency. Homogenisation, the process of standardising and aligning diverse datasets, is a fundamental to improving data consistency and interoperability across various domains. According to the Open Data Directive, high-value datasets (HVDs) are explicitly associated with significant benefits for society and the economy.

This approach, encompassing the standardisation of formats and quality enhancement techniques, fosters seamless integration, sharing, and analysis among diverse stakeholders – from companies, entrepreneurs, and researchers to policymakers and citizens. The resulting standardised datasets not only fuel innovation but also pave the way for collaborative efforts and informed decision-making across sectors such as healthcare, finance, environmental sciences, and more.

The ‘Report on Data Homogenisation for High-value Datasets’ offers a methodological blueprint, delving into challenges and recommendations crucial for understanding the concept and significance of HVDs. It navigates the intricate landscape of homogenisation between existing standards and DCAT-AP, emphasising the necessity of recognised licenses, application programming interfaces (APIs), and the adoption of controlled vocabularies and ontologies to describe datasets effectively.

In proposing methodologies for both data and metadata homogenisation, the report suggests using established data specifications from INSPIRE, Eurostat classifications for statistics, and SEMIC Core Business Vocabulary for companies and company ownership. It highlights the importance of conferring controlled vocabularies available via the Publications Office of the European Union for comprehensive standardisation. Additionally, improving API interoperability for bulk downloads and adhering to specific licensing terms like Creative Commons BY 4.0 are recommended for better accessibility and usage.

The path to data homogenisation faces various challenges stemming from the diverse origins and formats of datasets. For those aiming to take a leading role in this journey, the full report provides an insightful guide to navigating the complexities of homogenising high-value datasets.

For more news and events, follow us on Twitter, Facebook and LinkedIn, or subscribe to our newsletter.

Data homogenisation guidelines for high-value datasets

Text of this article

Subscribe to our newsletter