Access to European Union open data
EUROPAEU Open Data PortalGlossary
Menu

GLOSSARY

 

ADMS

Asset description metadata schema.

A vocabulary to describe interoperability assets making it possible for ICT developers to explore and search for interoperability assets. ADMS allows public administrations, businesses, standardisation bodies and academia to:

  • describe semantic assets in a common way so that they can be seamlessly cross-queried and discovered by ICT developers from a single access point;
  • search, identify, retrieve and compare semantic assets to be reused, avoiding duplication and expensive design work through a single point of access;
  • keep their own system for documenting and storing semantic assets;
  • improve indexing and visibility of their own assets;
  • link semantic assets to one another in cross-border and cross-sector settings.

Source: https://joinup.ec.europa.eu/asset/adms/description

API  

Application programming interface.

A way computer programmes talk to one another. It can be understood in terms of how a programmer sends instructions between programmes.

Source: http://schoolofdata.org/handbook/appendix/glossary

BULK DOWNLOAD 

A download containing files from multiple collections that can be retrieved at once.

CKAN

Comprehensive Knowledge Archive Network

A data management system that makes data accessible by providing tools to streamline publishing, sharing, finding and using data. CKAN is aimed at data publishers (national and regional governments, companies and organisations) working to make their data open and available.

Source: http://ckan.org/

CORDIS

The European Commission’s primary public repository and portal to disseminate information on all EU-funded research projects and their results.

Source: http://cordis.europa.eu/home_en.html

(DATA) CRAWLING 

A crawler is a programme that visits websites and reads their pages and other information in order to create entries for a search engine index. All major search engines on the web have such a programme, which is also known as a ‘spider’ or a ‘bot’.

Source: http://searchsoa.techtarget.com/definition/crawler

When extracting data from the web, the term ‘crawling’ is often also referred to as ‘data scraping’ or ‘harvesting’. There is a difference between these terms: crawling refers to dealing with large datasets where someone can develop their own crawlers (or bots), which crawl to the deepest parts of the web pages; data scraping on the other hand refers to retrieving information from any source (not necessarily from the web).

Source: https://www.promptcloud.com/blog/data-scraping-vs-data-crawling

CSV

‘Comma separated values’ file format, often used to exchange data between differently similar applications. The CSV file format is useable by KSpread, OpenOffice Calc and Microsoft Excel spreadsheet applications. Many other applications support CSV to import or export data.

Source: http://edoceo.com/utilitas/csv-file-format

DATASET 

A collection of related sets of data that is composed of separate elements but that can be manipulated as a unit and accessed or downloaded in one or more formats.

DCAT  

Data catalogue vocabulary.

An RDF vocabulary for interoperability of data catalogues.

See also: W3C - http://www.w3.org/TR/vocab-dcat

DCAT-AP  

DCAT application profile.

A common vocabulary for describing datasets hosted in data portals in Europe, based on the DCAT.

See also: https://joinup.ec.europa.eu/asset/dcat_application_profile/description

(DATA) DUMP 

A large amount of data transferred from one system or location to another.

Source: http://www.oxforddictionaries.com

DCMI

Dublin core metadata initiative.

An open organisation supporting innovation in metadata design and best practices across the metadata ecology.

Source: http://dublincore.org/

ELI

European legislation identifier.

It allows to uniquely identify and access national and European legislation online and to guarantee easier access, exchange and reuse of legislation for public authorities, professional users, academics and citizens. ELI paves the way for a semantic web of legal gazettes and official journals.

Source: https://en.wikipedia.org/wiki/European_Legislation_Identifier

FOAF

‘Friend of a friend’ is a machine-readable descriptive vocabulary of persons, their activities and their relations to other people and objects. FOAF allows groups of people to describe social networks without the need for a centralised database.

Source: https://en.wikipedia.org/wiki/FOAF_%28ontology%29

JSON

JavaScript object notation is an open-standard format that uses human-readable text to transmit data objects consisting of attribute–value pairs. It is the most common data format used for asynchronous browser/server communication (AJAJ).

Source: https://en.wikipedia.org/wiki/JSON

LINKED DATA 

Linked data describes a method of publishing structured data so that they can be interlinked. It builds upon standard web technologies such as HTTP and URI, but rather than using them to serve web pages for human readers it extends them to share information in a way that can be automatically read by computers.

Source: https://en.wikipedia.org/wiki/Linked_data

LINKED DATA PRINCIPLES

Linked data principles provide a common API for data on the web that is more convenient than many separately and differently designed APIs published by individual data suppliers. Tim Berners-Lee, the inventor of the web and the initiator of the linked data project, proposed the following principles upon which linked data is based:

  • use URIs to name things;
  • use HTTP URIs so that things can be referred to and looked up (dereferenced) by people and user agents;
  • when someone looks up a URI provide useful information using open web standards such as RDF or SPARQL;
  • include links to other related things using their URIs when publishing on the web.

Source: W3C - http://www.w3.org/TR/ld-glossary/#linked-data-principles

IMMC

Interinstitutional Metadata Maintenance Committee

The minimum set of metadata elements, the so-called IMMC core metadata, that is to be used in the data exchange.

Source: http://publications.europa.eu/mdr/core-metadata/

INTEROPERABILITY 

The ability of systems to exchange information and use the exchanged information.

ISA 

Interoperable Solutions for European Public Administrations.

It is a European Commission funded programme aiming at facilitating transactions among cross-border and/or cross-sector public administrations in Europe.

ISA² is the follow-up programme to ISA, which ran from 2010 to 2015. ISA² runs from 2016 until 2020.

MACHINE  READABLE 

Machine-readable data are data in a format that can be interpreted by a computer program. There are two types of machine-readable data:

  • Human-readable data that are marked up so that they can also be understood by computers, e.g. microformats, RDFa;
  • Data formats intended principally for computers, e.g. RDF, XML and JSON.

MASHUP 

The combination of multiple datasets from multiple sources to create a new service, visualisation or information.

METADATA 

Metadata is structured information that describes, explains, locates or otherwise makes it easier to retrieve, use or manage an information resource. Metadata is often referred to as data about data.

Source: NISO - http://www.niso.org/publications/press/UnderstandingMetadata.pdf

METADATA REGISTRY (MDR) 

The Metadata Registry is an important interoperability and standardisation tool. It registers and maintains definition data (metadata elements, named authority lists, schemas, etc.) used by the different European institutions.

http://publications.europa.eu/mdr/index.html

(DATA) MINING 

The practice of examining large pre-existing databases in order to generate new information.

Source: http://www.oxforddictionaries.com

‘For example, one Midwest grocery chain used the data mining capacity of Oracle software to analyse local buying patterns. They discovered that when men bought diapers on Thursdays and Saturdays, they also tended to buy beer. Further analysis showed that these shoppers typically did their weekly grocery shopping on Saturdays. On Thursdays, however, they only bought a few items. The retailer concluded that they purchased the beer to have it available for the upcoming weekend. The grocery chain could use this newly discovered information in various ways to increase revenue. For example, they could move the beer display closer to the diaper display. And, they could make sure beer and diapers were sold at full price on Thursdays.’

Source: http://www.anderson.ucla.edu/faculty/jason.frand/teacher/technologies/palace/datamining.htm

ONTOLOGY 

A formal model that allows knowledge to be represented for a specific domain. An ontology describes the types of things that exist (classes), the relationships between them (properties) and the logical ways those classes and properties can be used together (axioms).

Source: W3C - http://www.w3.org/TR/ld-glossary/#ontology

OPEN GOVERNMENT DATA 

Data collected, produced or paid for by the public bodies and made freely available for reuse for any purpose.

OPEN STANDARDS 

Generally understood as technical standards that are free from licencing restrictions. They can also be interpreted to mean standards that are developed in a vendor-neutral manner.

Source: http://schoolofdata.org/handbook/appendix/glossary

(DATA) PARSING 

Breaking a data block into smaller chunks by following a set of rules so that it can be more easily interpreted, managed or transmitted by a computer.

Source: http://www.businessdictionary.com/definition/parsing.html

PDF

Portable Document Format

A file format used to present and exchange documents independently of software, hardware or operating systems. It is an open standard maintained by the International Organisation for Standardisation.

Source: https://acrobat.adobe.com/be/en/products/about-adobe-pdf.html

PSI  

Public Sector Information.

It is the wide range of information that public sector bodies collect, produce, reproduce and disseminate in many areas of activity while accomplishing their institutional tasks.

It can be made available under a variety of licences not always open.

RAW DATA 

An expression that refers to data in their original state, not having been processed, aggregated or manipulated in any other way. It is also defined as ‘primary’.

RDF  

Resource description framework.

A family of international standards for data interchange on the web. RDF is based on the idea of identifying things using web identifiers or HTTP URIs and describing resources in terms of simple properties and property values.

Source: W3C - http://www.w3.org/TR/ld-glossary/#rdf

RDFa

Resource description framework in attributes is a W3C recommendation that adds a set of attribute-level extensions to HTML, XHTML and various XML-based document types for embedding rich metadata within web documents.

Source: https://en.wikipedia.org/wiki/RDFa

RESOURCE 

The physical representation of a dataset. Each resource can be a file of any kind, a link to a file elsewhere on the web or a link to an API. For example, if the data is being supplied in multiple formats or split into different areas or time periods, each file is a different ‘resource’ that should be described individually.

SEMANTIC WEB 

An evolution or part of the World Wide Web that consists of machine-readable data in RDF and an ability to query that information in standard ways (e.g. via SPARQL).

Source: W3C - http://www.w3.org/TR/ld-glossary/#semantic-web

(DATA) SCRAPING  

The process of extracting data in machine-readable formats of non-pure data sources, for example webpages or PDF documents. Often prefixed with the source (web scraping, PDF scraping).

Sources: http://en.wikipedia.org/wiki/Data_scraping

http://schoolofdata.org/handbook/appendix/glossary

SDMX

Statistical data and metadata exchange, an international initiative that aims at standardising and modernising the mechanisms and processes for the exchange of statistical data and metadata among international organisations and their member countries.

Source: https://en.wikipedia.org/wiki/SDMX

SEO

Search engine optimisation.

The process of positively affecting the visibility of a website or a web page in a search engine’s unpaid results.

Source: https://en.wikipedia.org/wiki/Search_engine_optimization

SOLR

An open source enterprise search platform, whose major features include full-text search, hit highlighting, faceted search, real-time indexing, dynamic clustering, database integration and rich document (e.g. Word, PDF) handling.

Source: https://en.wikipedia.org/wiki/Apache_Solr

SPARQL 

SPARQL protocol and RDF query language (SPARQL) defines a query language for RDF data, analogous to the structured query language (SQL) for relational databases.

Source: W3C - http://www.w3.org/TR/ld-glossary/#sparql

SPARQL ENDPOINT 

A service that accepts SPARQL queries and returns answers to them as SPARQL result sets. It is a best practice for dataset providers to give the URL of their SPARQL endpoint to allow access to their data programmatically or through a web interface.

Source: W3C - http://www.w3.org/TR/ld-glossary/#sparql-endpoint

STRUCTURED DATA 

Data that reside in fixed fields within a record or file. Relational databases and spreadsheets are examples of structured data. Although data in XML files are not fixed in location like traditional database records, they are nevertheless structured, because the data are tagged and can be accurately identified.

Source: PC Magazine encyclopaedia - http://www.pcmag.com/encyclopedia/term/52162/structured-data

TRIPLE, TRIPLE STORE 

A triple store is a purpose-built database for the storage and retrieval of triples through semantic queries. A triple is a data entity composed of subject-predicate-object, like ‘Bob is 35’ or ‘Bob knows Fred’.

Much like a relational database, information is stored in a triplestore and retrieved via a query language. Unlike a relational database, a triplestore is optimised for the storage and retrieval of triples. In addition to queries, triples can usually be imported/exported using RDF and other formats.

Source: Wikipedia.org - http://en.wikipedia.org/wiki/Triplestore

URI  

Uniform resource identifier.

A string that uniquely identifies virtually anything, including a physical building or more abstract concepts such as colours. It may or may not be resolvable on the web.

Source: W3C - http://www.w3.org/TR/ld-glossary/#uniform-resource-identifier

URL  

Uniform resource locator.

A global identifier commonly called a ‘web address’. A URL is resolvable on the web. All HTTP URLs are URIs; however, not all URIs are URLs.

Source: W3C - http://www.w3.org/TR/ld-glossary/#uniform-resource-locator

URN

Uniform resource name.

The historical name for a URI.

Source: https://en.wikipedia.org/wiki/Uniform_Resource_Name

VOCABULARY 

A collection of terms for a particular purpose. Vocabularies can range from simple, such as the widely used RDF schema, FOAF and DCMI element set, to complex vocabularies with thousands of terms, such as those used in healthcare to describe symptoms, diseases and treatments. Vocabularies play a very important role in linked data, specifically to help with data integration. The use of this term overlaps with that of ‘ontology’.

Source: W3C - http://www.w3.org/TR/ld-glossary/#vocabulary

WEB 1.0  

The first generation of the World Wide Web, characterised by separate static websites rather than continually updated weblogs and social networking tools.

Source: http://en.wiktionary.org/wiki/Web_1.0

WEB 2.0  

A colloquial description of the part of the World Wide Web that implements social networking, blogs, user comments and ratings, and related human-centred activities.

Source: W3C - http://www.w3.org/TR/ld-glossary/#web-2.0

WEB 3.0   

A colloquial description of the part of the World Wide Web that implements machine-readable data and the ability to perform distributed queries and analysis on that data. It is considered synonymous with the phrases ‘semantic web’ and ‘the web of data’.

Source: W3C - http://www.w3.org/TR/ld-glossary/#web-3.0

XML

Extensible markup language.

It is a markup language that defines a set of rules for encoding documents in a format which is both human readable and machine readable.

Source: https://en.wikipedia.org/wiki/XML