The datasets stored in the portal need to be of an appropriate quality in terms of:
- DCAT-AP-compliant mapping
- Available distributions
- Usage of machine-readable distribution formats
- Usage of known open-source licences.
To check the datasets for these quality indicators the Metadata Quality Assurance (MQA) tool was developed. The MQA runs as a periodic process in parallel to the harvesting. CKAN and Virtuoso are filled with metadata through the harvesting process. As CKAN cannot store DCAT-AP-formatted datasets directly, the datasets are mapped into a JSON (JavaScript Object Notation) schema that is DCAT-AP compliant. The MQA uses this schema for checking each dataset for its DCAT-AP mapping compliance. If there are any compliance issues detected, for instance if a mandatory field is missing, the dataset is considered as not DCAT-AP compliant.
The MQA presents its results in two views.
- The landing page or ‘Global Dashboard’. This view shows aggregated results for the entire service, i.e. the quality details for all catalogues.
- The second view or ‘Catalogue Dashboard’. This view allows you to select a specific catalogue for which you want to display the quality details.
The current quality indicators include the following.
-
Distribution statistics:
- accessible distributions
- error status codes
- download URL
- existence,
- top 20 catalogues with most accessible distributions,
- ratio of machine-readable datasets,
- most-used distribution formats,
- top 20 catalogues mostly using common machine-readable datasets.
- Dataset compliance statistics:
- top violation occurrences,
- compliant datasets,
- top 20 catalogues with most DCAT-AP-compliant datasets.
- Dataset licence usage:
- ratio of known to unknown licences,
- most used licences,
- top 20 catalogues with most datasets with known licences.