Linking data: what does it mean?
Publication Date/Time
2022-08-01T15:29:15+00:00
Country
Global
The basics of linked (open) data
_This article is the introduction to our new ‘Linking data’
series. It defines linked data and linked open data (LOD). The rest of
the series will present EU projects that use LOD. How and why do they
use it? Follow the series to find out
[https://data.europa.eu/en/publications/datastories?keywords=&country=All&year=&sort_by=created&sort_order=DESC&items_per_page=10&keywords=%22Linking+data%3A+%22&merged-select=created&items_per_page=10]._

Data is everywhere and we are constantly producing more of it. As
individuals, we create data while browsing the internet, booking a
flight or shopping online. Public institutions generate data from
traffic monitoring and weather tracking.

Used correctly, all of this data can bring benefits to our society as
a whole and to each of us individually. It can help to create
personalised medicines, fight floods and wildfires, improve public
transport systems and much more. To fully live up to its potential,
data needs to be accessible and available in a standardised format.
This is where linked (open) data comes in.

WHAT IS LINKED (OPEN) DATA?

There are many data sources out there and each source can have its own
way of encoding and presenting information. To connect the data and
create meaningful networks of information, a set of common design
principles is needed.

This is exactly what linked data is: a set of design principles for
publishing structured machine-readable data that allow to link it with
other data. When the data is open (free to use and distribute), it is
called linked open data.

FOUR PRINCIPLES OF LINKED DATA

There are several features of linked data that allow it to interlink
with other data. The inventor of the World Wide Web, Tim Berners-Lee
[https://www.w3.org/People/Berners-Lee/Overview.html], outlined four
principles [https://www.w3.org/wiki/LinkedData] to define them. The
four principles are the same whether the data is open or not, so in
this article we use only the term ‘linked data’_._

_1. Use URIs as names for things_

The uniform resource identifier (URI)
[https://op.europa.eu/en/web/webguide/uris] is a sequence of
characters which can give a unique name to virtually anything –
digital online content, a real object or an abstract concept. The
inventory number of a chair in your company’s office is its URI in
that specific context. URIs allow us to distinguish things, but also
to recognise things which are the same. For example, a dataset can
have different names in different languages, but its URI stays the
same:
[https://data.europa.eu/sites/default/files/img/media/image3-04.jpg]
To maintain its meaning, a URI must be persistent, in other words
permanently assigned to a particular resource. Imagine that your
company adopts a new inventory system and that a new code is assigned
to the same chair. For the chair’s URI to be persistent, the
organisation will have to map the old inventory number to the new one,
stating that both refer to the same chair.

_2. Use HTTP URIs so that people can look up those names_

Imagine that we create a dataset (in any file format) and give it a
name using a URI (e.g. _my-dataset_). How can we allow machines and
humans to easily look it up? By using hypertext transfer protocol
(HTTP) [https://www.w3schools.com/whatis/whatis_http.asp].

HTTP is a set of rules for transferring data (text, images, sound,
video) over the internet. It is the basis of communication between web
servers (where the data – our file _my-dataset _– is stored) and
web browsers (where users can ask to access data). Whenever you type a
website address into the browser and press ‘Enter’, your computer
sends an HTTP request to the correct web server. Then, the web server
sends you the requested HTML page – and that’s when you see the
website you wanted.

The website address that you type is an HTTP URI (also called ‘URL
[https://www.w3.org/TR/WD-html40-970708/htmlweb.html]’). The URI
‘my-dataset’ tells you ‘This is a unique resource called
_my-dataset_.’ The HTTP URI ‘http://my-dataset’ tells you
‘This is a unique resource called _my-dataset_, which can be
accessed via the web using the HTTP protocol.’

_3. When someone looks up a URI, provide useful information using the
RDF and SPARQL standards_

Linked data is supposed to be machine-readable and easy to interlink
with any other data. To achieve this, it is crucial to use a standard
format to represent the data and to use standard query (search)
language to find its metadata (information about the data).

 	* RDF: a standard way to describe data

Resource description framework (RDF) [https://www.w3.org/RDF/] is a
data model currently considered as a standard way to describe data. It
defines relationships between data objects using ‘triples’, based
on the subject–predicate–object structure which we know from our
human language.

Let’s consider the sentence ‘A dataset _(subject)_ was published
by _(predicate)_ Eurostat _(object)_.’ In RDF, all three parts of
the sentence can be expressed as a URI. It looks like this:
[https://data.europa.eu/sites/default/files/img/media/Lod-general-01.png]
In RDF, both the subject and the predicate have to be expressed as a
URI, while the object can be either a URI or a literal value (e.g. a
set of numbers or letters). If the object is expressed as a URI, then
it too can become the subject of a new triple, creating a bigger set
of interconnected information.
[https://data.europa.eu/sites/default/files/img/media/image2.jpg] 

 	* SPARQL: a standard way to search for and find data

In order to use data, people must first be able to find it. For this,
SPARQL Protocol and RDF Query Language (SPARQL)
[https://www.w3.org/TR/rdf-sparql-query/] can be used. It is a
standardised query language to retrieve and manipulate data stored in
RDF format. With SPARQL, you can search in multiple data sources in
one go using ‘SPARQL endpoints’
[https://www.w3.org/wiki/SparqlEndpoints]. The results of SPARQL
queries can be returned in multiple formats, including RDF.

_4. Include links to other URIs to discover more things_

As you can see in the example above, the number of links which can be
made between pieces of data using RDF triples is infinite. Why is it
worth adding more and more links between pieces of data? It allows us
to discover relationships between different pieces of data, gives data
more context and meaning and ultimately allows us to find more
information.

ENABLING THE ‘WEB OF DATA’

To sum up, linked data allows to break down information silos and
makes it easier to browse through complex data. The more linked data
is out there, the closer we are to creating the ‘Web of Data
[https://www.w3.org/standards/semanticweb/data]’ – a global
network of interconnected machine-readable information, as opposed to
a vast collection of unconnected datasets. To ensure that everyone can
benefit from the potential of today’s large amounts of data, as much
data as possible should be linked and open.

 

_Useful links_

Discover the power of linked open data – video series
[https://www.youtube.com/watch?v=9hZPWBNyLac&list=PLT5rARDev_rm1Q_rROsgZM9AyH0TmyBmb]

​
