Access to European Union open data
EUROPAEU Open Data PortalSPARQL
Menu

How to use the SPARQL endpoint

To access the EU Open Data Portal (EU ODP) data stored as triples, a machine-readable SPARQL endpoint allows querying the RDF descriptions of datasets.

SPARQL is an RDF query language, i.e. a semantic query language for databases.

The ‘Linked data page’ of the portal offers a graphical user interface to enter your SPARQL queries.

For programmatic use, a machine-readable endpoint is available at the following URL: http://data.europa.eu/euodp/sparqlep

The following section provides a short introduction to the SPARQL language and some examples that are specific to the EU ODP context.

For a complete documentation of the language, the specifications of SPARQL can be found on the W3C web site . The models used to describe datasets catalogued on the EU ODP are described on the ‘Linked data’ page under section ‘Metadata vocabulary’.

PREFIX

Shorthand to avoid writing full URIs in the queries, and prefixes can be defined in a query.

The syntax to use is the following.

PREFIX ${PREFIX_NAME}: ${FULL_URI}

Where:

  • ${PREFIX_NAME} is the name of the shorthand that will be used in the query instead of the full URI.
  • ${FULL_URI} is the URI that will be replaced by the prefix.

Example 1 below retrieves the list of publishers of the datasets ordered by the publishers’ URIs.

PREFIX dcat: <http://www.w3.org/ns/dcat#>

PREFIX dc: <http://purl.org/dc/terms/>

SELECT distinct ?Publisher WHERE {

?DatasetURI a dcat:Dataset .

?DatasetURI dc:publisher ?Publisher

 }

ORDER BY (?Publisher) LIMIT 100

SELECT

The ‘SELECT’ keyword is used to indicate which piece of information (variable) should be returned from the query. The syntax ‘SELECT *’ is an abbreviation to select all the variables of a query.

It is possible to select multiple variables.

Example 2 below retrieves the issue date and the status of a specific datasets.

PREFIX odp: <http://data.europa.eu/euodp/ontologies/ec-odp#>

PREFIX dc: <http://purl.org/dc/terms/>

SELECT ?issued ?status WHERE {

<https://ec.europa.eu/esco> dc:issued ?issued .

<https://ec.europa.eu/esco> odp:datasetStatus ?status

}

The ‘DISTINCT’ keyword can be used to guarantee the unicity of the results.

Example 3 below retrieves the list of publishers of datasets (without duplicates).

PREFIX dcat: <http://www.w3.org/ns/dcat#>

PREFIX dc: <http://purl.org/dc/terms/>

SELECT DISTINCT ?Publisher

WHERE {

?DatasetURI a dcat:Dataset .

?DatasetURI dc:publisher ?Publisher

}

ORDER BY (?Publisher)

In the example above, the shorthand ‘a’ is used. It replaces the ‘rdf:type’ property.

The ‘COUNT’ keyword can be used to retrieve the number of results for a variable.

Example 4 below retrieves the number of datasets by publisher sorted by number of dataset (in descending order).

PREFIX dc: <http://purl.org/dc/terms/>

PREFIX dcat: <http://www.w3.org/ns/dcat#>

SELECT ?Publisher COUNT(?DatasetURI) AS ?DatasetNumber

WHERE {

?DatasetURI a dcat:Dataset .

?DatasetURI dc:publisher ?Publisher

} ORDER BY desc (?DatasetNumber)

WHERE

The ‘WHERE’ keyword is used to specify a list of patterns to restrict the information for retrieval. Those patterns are triples where any of the elements can be replaced by a variable.

To separate multiple patterns, the character ‘.’ is used as a separator. This allows the option of combining patterns. Their combination works as a logical ‘AND’.

Example 5 below selects the publishers that published some datasets of the type ‘NameAuthorityList’ and the number of datasets:

PREFIX dc: http://purl.org/dc/terms/

PREFIX dcat: <http://www.w3.org/ns/dcat#>

PREFIX odp: <http://data.europa.eu/euodp/ontologies/ec-odp#>

SELECT distinct ?Publisher count(?DatasetURI )

WHERE {

?DatasetURI a dcat:Dataset .

?DatasetURI dc:publisher ?Publisher .

?DatasetURI odp:datasetType <http://data.europa.eu/euodp/kos/dataset-type/NameAuthorityList>

}

LIMIT

The ‘LIMIT’ keyword allows putting a limit to the number of results that will be returned.

See Example 1, above.

OPTIONAL

With the ‘OPTIONAL’ keyword, some patterns in the ‘WHERE’ clause can be set as not mandatory.

Example 6 below retrieves all datasets having a resource with a format ‘text/csv’ and, if available, retrieves the starting date of the temporal coverage period of the datasets.

PREFIX dc: http://purl.org/dc/terms/

PREFIX dcat: <http://www.w3.org/ns/dcat#>

PREFIX odp: <http://data.europa.eu/euodp/ontologies/ec-odp#>

SELECT ?DatasetURI ?period_start WHERE {

?DatasetURI a dcat:Dataset .

?DatasetURI dcat:distribution ?o .

?o odp:distributionFormat "text/csv" .

OPTIONAL {

?DatasetURI dc:temporal ?period .

?period odp:periodStart ?period_start

 }

}

 

FROM

In the triple store (which stores the RDF information), the triples can be grouped by graph. Then from a query it is possible to restrict the search to a specific graph. In the context of the EU ODP, each catalogue record is stored in its own graph.

Example below retrieves all triples of a specific catalogue record.

SELECT * FROM <http://data.europa.eu/euodp/data/dataset/PY7AnlFr46ANQZvz1nAhcg>

WHERE {

 ?s ?p ?o

}

FILTER

The ‘FILTER’ keyword provides a way to further restrict the solutions of a ‘WHERE’ clause. The restrictions can be, for instance, regex for string patterns or language selection.

Example 8 below selects all datasets having a title in Italian.

PREFIX dcat: <http://www.w3.org/ns/dcat#>

PREFIX dc: <http://purl.org/dc/terms/>

SELECT ?DatasetURI ?title WHERE {

?DatasetURI a dcat:Dataset .

?DatasetURI dc:title ?title

FILTER (lang(?title)='it')

}

Example 9 below retrieves all datasets having the keyword ‘animal’ in their title in English.

PREFIX dcat: <http://www.w3.org/ns/dcat#>

PREFIX dc: <http://purl.org/dc/terms/>

SELECT ?DatasetURI ?title WHERE {

?DatasetURI a dcat:Dataset .

?DatasetURI dc:title ?title

FILTER (lang(?title)='en')

FILTER(regex(?title, "animal", "i"))

}

The optional parameter ‘i’ in the ‘regex’ function allows results with different cases (case-insensitive search).

Use case  – Retrieves all metadata of all the records

As data providers can provide additional metadata compared to the CKAN model, the best approach would be to use the SPARQL endpoint for this query.

SELECT  * WHERE {

GRAPH ?graph {

?s ?p ?o

}

FILTER (regex(?graph, "^http://data.europa.eu/euodp/data/dataset/"

))}

This query will return only a part of the catalogue as the triple store has a limit of triples it can return. To circumvent this limit, the recommended approach is the use of pagination through the keywords ‘LIMIT’ and ‘OFFSET’ .