Rochtain ar shonraí oscailte an Aontais Eorpaigh
EUROPATairseach Sonraí Oscailte an AESPARQL
Menu
Forbhreathnú Ionad eolais Tionscadail Sandbox

HOW TO USE THE SPARQL Endpoint

To access the EU Open Data Portal data stored as triples, a machine readable SPARQL endpoint allows querying the RDF descriptions of datasets.

SPARQL is a RDF query language, i.e. a semantic query language for databases.

The Linked data page of the portal offers a graphical user interface to enter your SPARQL queries.

For programmatic use, a machine readable endpoint is available at the following URL: https://data.europa.eu/euodp/sparqlep

The following section provides a short introduction to the SPARQL language and some examples specific to the EU Open Data Portal context.

For a complete documentation of the language, the specifications of SPARQL can be found on the W3C web site . The models used to describe datasets catalogued on the EU Open Data Portal are described on the ‘Linked data’ page, chapter ‘Metadata vocabulary’.

PREFIX

As shorthand to avoid writing full URIs in the queries, prefixes can be defined in a query.

The syntax to use is the following:

PREFIX ${PREFIX_NAME}: ${FULL_URI}

Where:

  • ${PREFIX_NAME} is the name of the shorthand that will be used in the query instead of the full URI.
  • ${FULL_URI} is URI that will be replaced by the prefix.

Example 1: retrieves the list of publishers of the datasets ordered by the publishers’ URIs:

PREFIX dcat: <http://www.w3.org/ns/dcat#>

PREFIX dc: <http://purl.org/dc/terms/>

SELECT distinct ?Publisher WHERE {

?DatasetURI a dcat:Dataset .

?DatasetURI dc:publisher ?Publisher

 }

ORDER BY (?Publisher) LIMIT 100

SELECT

The SELECT keyword is used to indicate which piece of information (variable) should be returned from the query. The syntax “SELECT *” is an abbreviation to select all the variables of a query.

It is possible to select multiple variables.

Example 2: retrieves the issue date and the status of a specific datasets:

PREFIX odp: <https://data.europa.eu/euodp/ontologies/ec-odp#>

PREFIX dc: <http://purl.org/dc/terms/>

SELECT ?issued ?status WHERE {

<https://ec.europa.eu/esco> dc:issued ?issued .

<https://ec.europa.eu/esco> odp:datasetStatus ?status

}

The DISTINCT keyword can be used to guaranty the unicity of the results.

Example 3: retrieves the list of publishers of datasets (without duplicates):

PREFIX dcat: <http://www.w3.org/ns/dcat#>

PREFIX dc: <http://purl.org/dc/terms/>

SELECT DISTINCT ?Publisher

WHERE {

?DatasetURI a dcat:Dataset .

?DatasetURI dc:publisher ?Publisher

}

ORDER BY (?Publisher)

In the example above, the shorthand “a” is used. It replaces the “rdf:type” property.

The COUNT keyword can be used to retrieve the number of results for a variable.

Example 4: retrieves the number of datasets by publisher sorted by number of dataset (descending order).

PREFIX dc: <http://purl.org/dc/terms/>

PREFIX dcat: <http://www.w3.org/ns/dcat#>

SELECT ?Publisher COUNT(?DatasetURI) AS ?DatasetNumber

WHERE {

?DatasetURI a dcat:Dataset .

?DatasetURI dc:publisher ?Publisher

} ORDER BY desc (?DatasetNumber)

WHERE

The WHERE keyword is used to specify a list of patterns to restrict the information to retrieve. Those patterns are triples where any of the elements can be replaced by a variable.

To separate multiple patterns the character “.” is used as a separator. This allows combining patterns. Their combination works as a logical AND.

Example 5: Select the publishers that published some datasets of type “NameAuthorityList” and the number of datasets:

PREFIX dc: http://purl.org/dc/terms/

PREFIX dcat: <http://www.w3.org/ns/dcat#>

PREFIX odp: <https://data.europa.eu/euodp/ontologies/ec-odp#>

SELECT distinct ?Publisher count(?DatasetURI )

WHERE {

?DatasetURI a dcat:Dataset .

?DatasetURI dc:publisher ?Publisher .

?DatasetURI odp:datasetType <https://data.europa.eu/euodp/kos/dataset-type/NameAuthorityList>

}

LIMIT

The LIMIT keyword allows putting a limit to the number of results that will be returned.

See Example 1.

OPTIONAL

With the OPTIONAL keyword some patterns in the WHERE clause can be set as not mandatory.

Example 6: Retrieves all datasets having a resource with a format “text/csv” and, if available, retrieves the starting date of the temporal coverage period of the datasets.

PREFIX dc: http://purl.org/dc/terms/

PREFIX dcat: <http://www.w3.org/ns/dcat#>

PREFIX odp: <https://data.europa.eu/euodp/ontologies/ec-odp#>

SELECT ?DatasetURI ?period_start WHERE {

?DatasetURI a dcat:Dataset .

?DatasetURI dcat:distribution ?o .

?o odp:distributionFormat "text/csv" .

OPTIONAL {

?DatasetURI dc:temporal ?period .

?period odp:periodStart ?period_start

 }

}

 

FROM

In the triple store (which stores the RDF information), the triples can be grouped by graph. Then, from a query, it’s possible to restrict the search to a specific graph. In the context of the EU Open Data Portal, each catalogue record is stored in its own graph.

Example 7: retrieve all triples of a specific catalogue record:

SELECT * FROM <https://data.europa.eu/euodp/data/dataset/PY7AnlFr46ANQZvz1nAhcg>

WHERE {

 ?s ?p ?o

}

FILTER

The FILTER keyword provides a way to further restrict the solutions of a WHERE clause. The restrictions can be for instance regex for string patterns or language selection.

Example 8: Select all datasets having a title in Italian

PREFIX dcat: <http://www.w3.org/ns/dcat#>

PREFIX dc: <http://purl.org/dc/terms/>

SELECT ?DatasetURI ?title WHERE {

?DatasetURI a dcat:Dataset .

?DatasetURI dc:title ?title

FILTER (lang(?title)='it')

}

Example 9: Retrieve all datasets having the keyword “animal” in their title in English

PREFIX dcat: <http://www.w3.org/ns/dcat#>

PREFIX dc: <http://purl.org/dc/terms/>

SELECT ?DatasetURI ?title WHERE {

?DatasetURI a dcat:Dataset .

?DatasetURI dc:title ?title

FILTER (lang(?title)='en')

FILTER(regex(?title, "animal", "i"))

}

The optional parameter i in the regex function allows results with different cases (case insensitive search).

Use Case  – Retrieves all metadata of all the records.

As data providers can provide additional metadata compared to the CKAN model the best approach would be to use the SPARQL endpoint for this query.

SELECT  * WHERE {

GRAPH ?graph {

?s ?p ?o

}

FILTER (regex(?graph, "^https://data.europa.eu/euodp/data/dataset/"

))}

This query will return only a part of the catalogue as the triple store has a limit of triples it can return. To circumvent this limit, the recommended approach is the use of pagination through the keywords LIMIT and OFFSET .