Linking data: Data Catalogue Vocabulary Application Profile
In our ‘Linking data’ series, we present EU projects that use linked open data (LOD). You may be wondering, what data is linked in their projects? Why did they decide to use LOD? What benefits does it bring? Follow the series to find out.
In this episode, we are presenting the Data Catalogue Vocabulary (DCAT) and its application profile for data portals in Europe. Read on to find out what DCAT is, and how and why it uses LOD.
All about metadata
What is metadata? In simple words, metadata is data about data. Just like a label on a soda can provides metadata about the drink, and a library catalogue card provides metadata about a book, a dataset description provides metadata about the dataset.
A book’s metadata is useful if readers want to find out more about the book. But aggregated metadata – a set of catalogue cards – allows readers to choose between multiple books, browse by topic or author, and quickly get a glimpse of what is available in the library. The bigger the set, the more choice and information is offered to the readers.
This is also true for data in general, and open data especially. Aggregating metadata of open data from many sources allows users to get a good overview of existing open datasets that can be reused. This is exactly the role of open data portals – aggregating metadata of datasets from various sources to improve their discoverability and reusability.
A standard way of describing data
Just as all catalogue cards in a library have the same content structure, descriptions of datasets from various sources must also be standardised to be published in a data portal.
To enable data publishers to describe datasets in a standardised manner, DCAT was created. It provides a standard data model and vocabulary for describing data. To find out more about vocabularies and data models in general and in the EU context, read our article on reference data.
The base structure of DCAT is made up of several building blocks (classes) that define which broad aspect of data is being described, for example:
- ‘catalogue’ indicates a description of a collection of datasets;
- ‘dataset’ indicates a description of a dataset – in other words, structured data typically curated by a single agent;
- ‘distribution’ indicates a description of how and where the actual file containing the data is accessible.
Each of the classes can be described with multiple properties, such as title, publisher, language, release date.
DCAT Application Profile for data portals in Europe
To further harmonise descriptions of European public sector data, a working group of experts from EU institutions and Member States created the DCAT Application Profile (DCAT-AP).
An application profile describes how a standard is to be applied in a particular domain or application. In this case, DCAT-AP specifies how DCAT is to be applied when describing metadata of public sector datasets in Europe.
What are the specific features of DCAT-AP?
- DCAT-AP indicates classes and properties that are mandatory.
Mandatory classes and properties must be provided in the metadata description and every data portal in Europe should be able to read them. The mandatory classes are ‘catalogue’, ‘dataset’ and ‘distribution’. The mandatory properties are:
- title, description, publisher and dataset for ‘catalogue’;
- title and description for ‘dataset’;
- access URL for ‘distribution’.
Other DCAT classes and properties are defined as either recommended or optional. Those metadata elements might not be available for all catalogues and datasets and may not be essential for the functionality of a portal.
- When describing properties, DCAT-AP requires the use of several EU controlled vocabularies.
For example, the ‘File type’ vocabulary is used to describe the format property of the distribution class. When providing information on the format in which a given dataset is available, the data provider can use only file type names defined in this vocabulary.
- It defines ranges and domains. For example, a publisher of a dataset (domain) must be defined as belonging to the class of an agent (range).
Data linking with DCAT-AP
As well as being used to describe data, the DCAT-AP structure also supports LOD.
First, the DCAT-AP structure offers the possibility to link local datasets to parent catalogue(s) when describing a dataset. This allows data providers to link their own datasets locally and show connections between the datasets.
Secondly, using DCAT-AP ensures that datasets are described in a harmonised way according to specific vocabularies. This allows the datasets to be linked with datasets from other sources also described using this data model.
What are the benefits of DCAT-AP?
Using DCAT-AP has multiple benefits both for data providers and for data reusers.
If you are a data provider, using DCAT-AP is a perfect way to make your data searchable and accessible. Since metadata that is created using this model can be easily shared on one or more data portals, it can also significantly reduce costs that would otherwise be needed to achieve the same range of reach. All this helps to improve discoverability and, in turn, the reusability of your data.
If you are a data reuser, the main benefit for you is of course access to European public sector data. Thanks to DCAT-AP, it can be aggregated on international, national, regional, local and domain-specific data portals, as well as on data.europa.eu. DCAT-AP also allows you to easily retrieve the datasets (e.g. using SPARQL queries). Additionally, DCAT-AP helps to overcome language barriers and search for data across European countries.
Useful links can be found below.
A free training course on DCAT and DCAT-AP: https://data.europa.eu/en/academy/dcat-and-dcat-ap-interoperability-data-model.
Latest and previous releases of DCAT-AP: https://joinup.ec.europa.eu/collection/semantic-interoperability-community-semic/solution/dcat-application-profile-data-portals-europe/releases.
A leaflet explaining DCAT-AP in a nutshell: https://joinup.ec.europa.eu/sites/default/files/inline-files/Leaflet%20DCAT-AP.pdf.
Technical documentation of DCAT-AP: https://joinup.ec.europa.eu/collection/semantic-interoperability-community-semic/solution/dcat-application-profile-data-portals-europe/distribution/dcat-ap-12-docx.
DCAT-AP validator: https://joinup.ec.europa.eu/collection/semantic-interoperability-community-semic/solution/dcat-ap-validator.
Paper on DCAT-AP, ‘Towards an open government data ecosystem in Europe using common standards’: https://joinup.ec.europa.eu/collection/semantic-interoperability-community-semic/document/towards-open-government-data-ecosystem-europe-using-common-standards.
Graphics used in this article (reusable under CC-BY-0.4): https://gitlab.com/dataeuropa/data-provider-repository/-/tree/master/Data%20stories/Linked%20Open%20Data/LOD_DCAT-AP_Graphics.