To access the data of data.europa.eu, a machine-readable SPARQL endpoint allows querying the RDF descriptions of datasets.
SPARQL is an RDF query language, i.e. a semantic query language for databases.
The SPARQL search of the portal offers a graphical user interface to enter your SPARQL queries.
For programmatic use, a machine-readable endpoint is available at the following URL: https://data.europa.eu/data/sparql?locale=lv
The following section provides a short introduction to the SPARQL language and some examples that are specific to the context of data.europa.eu.
For a complete documentation of the language, the specifications of SPARQL can be found on the W3C web site.
SPARQL Query Language
If you are not familiar with using SPARQL Query Language, we invite you to read this short summary of useful SPARQL clauses which are used in the queries below. For further reading please refer to W3C recommendation https://www.w3.org/TR/rdf-sparql-query/.
- PREFIX - Shorthand to avoid writing full URIs in the queries, prefixes can be defined in a query. Below is a list of most common prefixes used in the RDFs of data.europa.eu datasets. The syntax to use is the following.
PREFIX ${PREFIX_NAME}: ${FULL_URI}
Where:- ${PREFIX_NAME} is the name of the shorthand that will be used in the query instead of the full URI.
- ${FULL_URI} is the URI that will be replaced by the prefix.
- SELECT - The 'SELECT' clause is used to indicate which piece of information (variable) should be returned from the query. The syntax 'SELECT *' is an abbreviation to select all the variables of a query.
It is possible to select multiple variables. When using 'SELECT DISTINCT' variables that occur multiple times in the result will only be selected once. - WHERE - The 'WHERE' clause is used to specify a list of patterns to restrict the information for retrieval. Those patterns are triples where any of the elements can be replaced by a variable.
To separate multiple patterns, the character '.' is used as a separator. This allows the option of combining patterns. Their combination works as a logical 'AND'. - LIMIT - The 'LIMIT' clause allows putting a limit to the number of results that will be returned.
- OPTIONAL - With the 'OPTIONAL' clause, some patterns in the 'WHERE' clause can be set as not mandatory. When a appears in an 'OPTIONAL' clause the result will only display values when present and would appear blank when missing.
- FROM - In the triple store (which stores the RDF information), the triples can be grouped by graph. It is possible to restrict a query to a specific graph by using FROM <[URIofGraph]>>. In the context of data.europa.eu, each dataset is stored in its own graph. Other named graphs include metrics of the data quality evaluation, Controlled Vocabularies used in DCAT-AP (e.g. http://publications.europa.eu/resource/authority/data-theme; <http://publications.europa.eu/resource/authority/access-right>) NUTS codes (e.g. http://publications.europa.eu/resource/authority/nuts-gisco-links; <http://publications.europa.eu/resource/authority/nuts>) and more.
- FILTER - The 'FILTER' clause provides a way to further restrict the solutions of a 'WHERE' clause. The restrictions can be, for instance, regex for string patterns or language selection.
A list of PREFIXES used in data.europa.eu RDFs
PREFIX dcat: <http://www.w3.org/ns/dcat#>
PREFIX odp: <http://data.europa.eu/euodp/ontologies/ec-odp#>
PREFIX dct: <http://purl.org/dc/terms/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX adms: <http://www.w3.org/ns/adms#>
PREFIX dcatapop: <http://data.europa.eu/88u/ontology/dcatapop>
PREFIX gsp: <http://www.opengis.net/ont/geosparql#>
PREFIX locn: <http://www.w3.org/ns/locn#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX prov: <http://www.w3.org/ns/prov#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX schema: <http://schema.org/>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX spdx: <http://spdx.org/rdf/terms#>
PREFIX vcard: http://www.w3.org/2006/vcard/ns#
Sample SPARQL Queries
Get a list of all catalogues on data.europa.eu
This query retrieves a full list catalogues published on data.europa.eu. It then retrieves with an OPTIONAL clause the catalogue's title, homepage, geographical coverage etc.
PREFIX dcat: <http://www.w3.org/ns/dcat#> |
Show all datasets, distributions and data service of the former EU Open Data Portal
When the former EU Open Data Portal was consolidated with the European Data Portal to the data.europa.eu portal, datasets of each publisher were modelled as individual catalogues that are expressed as part of the EU Open Data Portal. We the URI <http://data.europa.eu/88u/catalogue/european-union-open-data-portal> to indicate the EU Portal and the property dct:hasPart to call all catalogues that are part of it. The OPTIONAL clause used to call data-services allows calling all datasets and distribution even when these are not linked to a data-service.
PREFIX dcat: <http://www.w3.org/ns/dcat#> |
Datasets, their publisher and keywords in the Environment thematic category
Thematic categories of datasets are expressed using the dcat:theme property using terms from the data-theme controlled vocabulary. Another property used to describe the subject-matter of datasets is dcat:keyword which uses literal values (i.e. free-text). The following query calls all datasets for which the URI corresponding to the concept "Environment" is defined and retrieves the name of the publisher (defined as either rdfs:label or skos:prefLabel), the title of the dataset and groups all keywords used to describe it. Because datasets on the portal are potentially available in all official EU Languages, a language filter is used to restrict results to wither English or no language tag. The GROUP_CONCAT clause groups all keywords used for each dataset together using "|" as a separator.
PREFIX dcat: <http://www.w3.org/ns/dcat#> |
Retrieve datasets for which a Eurovoc term is defined
Eurovoc is the multilingual multi-disciplinary thesaurus curated by the publications Office of the European Union. It is widely used by both EU and national institutions in a variety of contexts. The following query retrieves datasets that have one of the indicated terms defined for them (using the syntax "||" to mark an "OR" function) via the dct:subject or dcat:theme property. Being a controlled vocabulary, all terms are indicated by a URI. The labels are available in 24 languages and the language filter used in the query restricts results to labels provided in English.
PREFIX dcat: <http://www.w3.org/ns/dcat#> |
Retrieve dates related to a specific dataset
This query retrieves different dates recorded via the dct:issued and dct:modified proporties for a specific dataset. The query is done on the classes:
- Catalogue record
- Dataset
- Distribution
For Datasets an additional property is dct:temporal, of type dct:PeriodOfTime which registers the period of time covered by the dataset. For Distributions, the dct:accrualPeriodicity indicates how often the data is updated.
The OPTIONAL clause is used to search for the metadata even when no value is provided.
PREFIX dcat: <http://www.w3.org/ns/dcat#> |
Find dataset distributions of a specific file-type
File types are assigned to Distributions of dataset as IANA media types (MIME types) using dcat:mediaType - a subproperty of dct:format. dct:format further specifies file types of distributions using terms for the Publications Office of the European Union File Type vocabulary.
The following query looks for Distribution with a dcat:accessURL - the URL giving access to the file itself - that are of XML file type. The file type is expressed using the File Type vocabulary using a FILTER clause to restrict results. Because different publishers encode values in to assign IANA file types an OPTIONAL clause is used.
PREFIX dcat: <http://www.w3.org/ns/dcat#> |
See the status of Distributions of a selected dataset
Values used to define the status of a Distribution are drawn from a Controlled Vocabulary and can be either "Completed", "Deprecated", "Under Development" or "Withdrawn". The following query retrieves all distributions of a specific dataset (identified by the URI <http://data.europa.eu/88u/dataset/eurovoc>) for which the property adms:status is assigned.
PREFIX dcat: <http://www.w3.org/ns/dcat#> |
Publishers by catalogue
This query looks for up to 25 publishers who publish datasets in a defined catalogue (identified by the URI <http://data.europa.eu/88u/catalogue/nkod-opendata-cz>).
PREFIX dcat: <http://www.w3.org/ns/dcat#> |