This course aims to give you an overview on how to apply the Data Visualisation techniques you saw in Introduction to Data Visualisation in the programming language R. Data visualisation can be described as both an art and a science and there is a significant consideration given to the trade off between them when producing publication quality plots.
This course is not designed to act as a fully comprehensive guide to Data Visualisation but does function as a sort of ‘recipe book’ for plot creation, aesthetics, good practice, editing and how to create a variety of different plot types. As such, the content can be more on the drier side and as such has been split into core and reference material, with the core material being compulsory learning for the course and reference being optional and for learners to dip in and out of where needed.
As far as this course is concerned:
Chapters 1 through 4 are core material.
Chapters 5 through 7 are reference material.
It is important to note that while this course follows the most recently published AF visualisation guidelines; it does not replace the current procedures in place for publishing data. If you are a member of another government department you should also familiarise yourself with your department’s guidelines to identify any differences that need to be applied for your particular project that are not covered here in this course.
2 Packages and Data
2.1 Packages
The main packages we will be using for this course are:
tidyverse (Version 2.0.0), used for data manipulation.
janitor (Version 2.2.0), used for data cleaning.
showtext (Version 0.9.5) used to get extra fonts.
patchwork (Version 1.1.3), used to combine multiple plots.
RcolorBrewer (Version 1.1.3), used for colour palettes.
scales (Version 1.2.1), automatically determining labels and ticks.
ggrepel (Version 0.9.3), for rotating labels away from geometric plots.
gt (Version 0.10.0) used to create tables.
Note:
The tidyverse collection of packages contains the most important package for this entire course, the ggplot2 package.
Packages are for the most part backwards compatible, so it is rare that an older version will conflict with the materials.
2.2 Data
Throughout the course we will use the gapminder data, an excellent dataset for learning Data Visualisation techniques, which contains the following variables;
Country - There are 131 Countries included.
Continent - There are 5 Continents included.
Year - Data for the years 1962, 1967, 1972, 1977, 1982, 1987, 1992, 1997, 2002 and 2007.
Life expectancy - Life expectancy at birth, measured in years.
Population - Measured every five years, from 1952 to 2007.
GDP per capita - Gross domestic product per capita in “international dollars”, a hypothetical unit of currency, equivalent to the power parity of the US dollar at 2005, in this case.
Infant Mortality - Number of deaths of children under one year of age per 1000 live births.
Fertility - Number of children per woman.
Let’s have a quick look at the dataset.
# Take a peak at gapminderlibrary(tidyverse)gapminder <-read_csv("./data/gapminder.csv")gapminder |>glimpse()
Before visualising datasets, we must first process them using the techniques covered in the Introduction to R course. As such, the tidyverse (and its incredibly useful packages like dplyr) will be essential throughout this course.
The code will be commented and follow the The Tidyverse Style Guide, which you should bookmark if you haven’t already. It is a great reference tool, particularly when ensuring the code you have written is as clean and human readable as possible.
Specifics of how data has been processed will be provided as Exercises as this is an excellent way to review the content covered in the pre-requisite course. Of course, any techniques not covered previously will be shown in an example and explained in full. With that being said, please ensure that you are comfortable with manipulating data (perhaps by reviewing the pre-requisite course) before commencing this course.
4 Visualisation Guidelines
In this course we will be following the most up to date guidelines set out in the Data Visualisation course run by the Analysis Function team, which it is recommended to take before embarking on this course. It is worth noting that the newest guidance may differ slightly from the ONS Style Guide (provided below) as well as specific organisational guidelines.
Material used to create this course can be found here:
As previously mentioned, this course is not intended to replace traditional data visualisation processes that apply within organisations. Therefore, where guidance provided here disagrees with an organisation specific guideline, it is not to be replaced with what we have shown here, you should always prioritise your organisations guidelines first. As such, prior to publishing a visualisation produced using this guide, organisational guidelines should always be checked first.
5 Summary
Excellent job! You have completed Chapter 1 of the Data Visualisation in R course. Now that you understand the structure and setup of the course, it is time to introducing the plotting package and build your first plot (a delicious red velvet cake) layer by layer in Chapter 2.