Home icon
Data Visualisation Guide

Missing values

2 minutes read

Pitfalls in data

Missing values in data need to be identified, understood and dealt with before any further processing of the data.

There are numerous ways to store missing values in data. Values may be missing completely (like empty cells in a spreadsheet), they can consist of empty strings (””), special characters (NA, null) or impossible values (like -1 or -9999).

The value of 0 deserves special attention. Does it really mean 0, or is it maybe used as a replacement for missing data? Missing date values might be present in the data as January 1, 1900 or as January 1, 1970, so these dates are suspicious and might be an indication of missing values. Make sure that you know how missing values in your data are stored, and that you are able to identify them.

Missing data can also be “hidden”. Take for example a time series data with monthly values. If the data for August would be missing, the data could look like this:

Month Value
January 123
February 126
March 145
April 136
May 111
June 98
July 128
August
September 154
October 167
November 183
December 186

In the table above, the value for August is not hidden, and explicitly missing. This method of storing missing data is preferred over “hiding” missing data, like in the table below:

Month Value
January 123
February 126
March 145
April 136
May 111
June 98
July 128
September 154
October 167
November 183
December 186

You also need to know why values are missing. Was the data never collected for the records with missing values? Or maybe the data is a result of joining 2 tables together, when not all records were available in both tables? Or maybe the column with missing data is the result of a calculation that generated an error for some rows in the data?

If you know why values are missing, you will have a better idea of how to handle them: should you filter them out, set them to 0 or some other value, or should you try to recover missing values from some additional data sources?

Related pages

Pitfalls with dates

Data type mismatches

Duplicates, aggregates and totals

Clean spreadsheets

Missing values and outliers

File encoding

Pitfalls in data