Home icon
Data Visualisation Guide

Wide versus long data

5 minutes read

Data structures

To make a line chart of the employment rate by country, many visualisation tools expect your data to look like this:

Country 2017 2018 2019 2020 2021
Belgium 69,8 71 71,8 71,5 71,9
Bulgaria 71,4 72,4 75 73,4 73,2
Czechia 78,4 79,8 80,2 79,6 79,8
Denmark 77,8 78,7 79,4 78,8 79,8

However, you might notice that the data you downloaded from the source is structured differently. It might look like this:

Country Year Value
Belgium 2017 69,8
Belgium 2018 71
Belgium 2019 71,8
Belgium 2020 71,5
Belgium 2021 71,9
Bulgaria 2017 71,4
Bulgaria 2018 72,4
Bulgaria 2019 75
Bulgaria 2020 73,4
Bulgaria 2021 73,2
Czechia 2017 78,4
Czechia 2018 79,8
Czechia 2019 80,2
Czechia 2020 79,6
Czechia 2021 79,8
Denmark 2017 77,8
Denmark 2018 78,7
Denmark 2019 79,4
Denmark 2020 78,8
Denmark 2021 79,8

Both tables contain exactly the same data, but the tables are structured very differently.

The first table is using the wide data format. This is a common structure in spreadsheets, and is very convenient to look up values: you quickly know where to look in the table when you want to know what the employment rate in Bulgaria in 2019 was, for example.

With this format, it is also easy to calculate the average employment rate for each country: you can add a column to the right of the table to calculate it with a spreadsheet formula. In the same manner you can add a row at the bottom of the table to calculate the average employment rate over all countries by year.

The wide format is also efficient: it does not contain cells with the same values.

But the wide format has its limitations. What if you would like to publish employment data for men and for women separately, and maybe also break the data down by age class? With the wide format, you need to create a separate table for each combination of gender and age, which is not handy to work with.

The long data format has repeated values (in the table above each country name is present 5 times, and each year 4 times), is less effective for looking up values visually, and requires some work if you want to calculate averages. But it has a major advantage over the wide format: if the data needs to be broken down by gender and age, you can just add these columns to your data:

Country Year Value Gender Age
Belgium 2017 69,8 M 18-30
Belgium 2018 71 M 18-30
Belgium 2019 71,8 M 18-30
Belgium 2020 71,5 M 18-30
Belgium 2021 71,9 M 18-30
Belgium 2017 71,4 F 18-30
Belgium 2018 72,4 F 18-30
Belgium 2019 75 F 18-30
Belgium 2020 73,4 F 18-30
Belgium 2021 73,2 F 18-30
Belgium 2017 78,4 M 30-50
Belgium 2018 79,8 M 30-50
Belgium 2019 80,2 M 30-50
Belgium 2020 79,6 M 30-50
Belgium 2021 79,8 M 30-50
Belgium 2017 77,8 F 30-50
Belgium 2018 78,7 F 30-50
Belgium 2019 79,4 F 30-50
Belgium 2020 78,8 F 30-50
Belgium 2021 79,8 F 30-50
Bulgaria

This table is of course a little wider than the long table that did not contain the gender and age breakdown. But it is still called a ‘long’ table because each row contains a value for one numerical variable only. In this example: the value for the employment rate. The other cells in each row contain values for the dimensions country, year, gender and age, specifying to which value of the respective dimension the employment rate refers to. These dimensions are categorical variables that can be used to break down the values of the numerical variable.

Compared to the wide format, the long format data has column names that “make sense”: they describe what they contain. In the wide format, the column names “2017”, “2018”, … do not describe what they contain: they are actually part of the data.

The long data format reflects how data is stored in databases. It is a format that is very machine readable, and it allows for easy data processing like filtering and aggregating over multiple dimensions.

(Un)pivoting

Because the first two tables in this section contain the exact same data, they can be transformed into one another. The process of going from long data to wide data is called “pivotting” a table. You choose which one of the two dimensions (Country or Year) goes in the rows and which of them goes in the columns, and Excel (or any other spreadsheet or data analysis tool) will take care of putting the values in the right cells of the wide table. Pivotting a table in Excel is demonstrated in the Working with data: cleaning data in practice T7 LINK video.

Unpivoting data (going from wide to long format) is also possible but is less common and not straight forward in Excel.

Related pages

RAWGraphs

Datawrapper

Flourish

Intro to tidy data

Advantages of tidy data

Tidying data

Data structures