Chapter 3 - Customising Plots to Analysis Function Guidelines
1 Introduction
Often the default behavior of plotting packages will not create outputs that meet our style guides or specifications. Now we will demonstrate how to best adhere to these guidelines when creating plots with ggplot2, allowing us to construct publishable quality, accessible visualisations.
If your department has different guidance you should follow those where possible and in the absence of particular guidelines, adopt those from the Analysis Function.
2 Packages
As always, we first load in the packages we want to use:
# Packageslibrary(tidyverse) # Contains ggplot2, tidyr, dplyr and readr which we require
Warning: package 'tidyverse' was built under R version 4.4.2
Warning: package 'ggplot2' was built under R version 4.4.2
Warning: package 'tidyr' was built under R version 4.4.2
Warning: package 'readr' was built under R version 4.4.2
Warning: package 'purrr' was built under R version 4.4.2
Warning: package 'stringr' was built under R version 4.4.2
Warning: package 'forcats' was built under R version 4.4.2
Warning: package 'lubridate' was built under R version 4.4.2
library(janitor) # To clean column names
Warning: package 'janitor' was built under R version 4.4.2
library(scales) # Methods for automatically determining labels for axes and legends
Warning: package 'scales' was built under R version 4.4.2
3 The data
Now we read in the dataset and drop the null values to make visualisation easier.
# Importing data using the read_csv function gapminder <-read_csv("./data/gapminder.csv")# Cleaning column names using the clean_names function# Dropping missing values using the drop_na function gapminder <-clean_names(gapminder) |>drop_na()
4 Fonts, Labels and Titles
4.1 Accessible Fonts
In this section we’ll talk about fonts in ggplot, which like other plot elements, can be set globally to save time. The AF recommends using a single accessible sans serif font, as serifed fonts are usually harder to read for dyslexic readers. Some great examples of this font are:
Arial
Tahoma
Helvetica
Verdana.
The default font for ggplot is Helvetica, so it is nice to see that the creators of the package had accessibility in mind when selecting a default font for the visualisation we create. In terms of the styling of the text itself, there are many important guidelines:
We should also refrain from using italics or underlines in digital content due to them being generally inaccessible.
Use bold sparingly except for headings.
Use one, accessible font in charts and tables.
Text size should be 12pt or larger.
4.2 Changing the font
You can change the font used in a plot in numerous ways:
All of the built-in ggplot themes have a “base_family” argument for setting the overall font family for the plot
element_text() has a family argument for changing fonts on individual plot elements
To make this easier, there is a very useful package we will use called ‘Showtext’, which streamlines text rendering:
Rendering text is done internally without using external software.
Non-standard font rendering is fully considered.
We can set flags for automatically using showtext to render.
We can finely control when showtext is used for specific graphical elements if desired.
Before we can use showtext to render, we need fonts:
Firstly, we need the sysfonts package and it’s various font importing functions to load them into our session.
Showtext continues to streamline by automatically loading this package via the showtextdb package (acts like a repository).
# Load packagelibrary(showtext)
Notice that required packages for fonts are loaded upon loading showtext.
Now that we have a repository of fonts to use, we can add them to our session with the font_add() function, taking the arguments:
family - The name of the font you want to use (string).
regular - The path to where the font is stored (string).
We need to provide a path to the font file for showtext to be able to build the visual representation up from its source file.
Finding font files
In Windows, they are usually stored at ’C:_file’
In Mac OS, they are usually stored at ‘/Library/Fonts/font_file’
Where you will need to locate this file and its relevant file extension, commonly:
.ttf
.ttc
.otf
To locate the folders and files we do the following:
In Windows: 1. Navigate to your C drive 2. Open the ‘Windows’ folder in the C Drive 3. Open the ‘Fonts’ folder in the Windows Folder 4. Locate the Arial font you require. Unfortunately, this doesn’t always show file extensions, so go with .ttf for now.
In MacOS: 1. Open the ‘Finder’ application 2. Click the ‘Go’ option in the top left menu of the mac (where the apple logo is) 3. In the resulting dropdown, select ‘computer’ and ‘Macintosh HD’ 4. Select ‘Library’ 5. Select ‘Fonts’ 6. Find your font file.
4.2.1.1 Example
Let’s add the Arial font into our session, you will want to uncomment the particular code that will work for your system. Note that this may need some exploration into where your fonts are stored if issues arise.
# Bring Arial into the session with font_add()# Windows font_add(family ="Arial", regular ="C:\\Windows\\Fonts\\Arial.ttf")# Mac# font_add(family = "Arial", regular = "/Library/Fonts/Arial.ttf")
Now that we have added in the font of our choosing, to enable it to be used in specific ggplot2 layer functions, We need to initialise the use of showtext to compile fonts with the “showtext_auto()” function:
This ensures all text rendering after running this function calls upon showtext.
This is more accessible than base renders as we can give fonts the name we choose, vs. their often uncommon base file names.
For finer control:
We can use the ‘showtext_begin()’ and ‘showtext_end()’ options to force everything within these function calls to adhere to showtext rendering.
We can then pass to other rendering methods should these be required.
Showtext is an excellent package, but isn’t without it’s downsides:
We cannot set the size of the font globally using this method (this still needs to be done when each label is added).
We very rarely wish to use the interesting fonts it can render due to them violating Analysis Function guidelines.
We do still have a streamlined solution for sans-serif fonts though and there are more of them to choose from with showtext’s repository added on.
Let’s use showtext from here on.
# Use showtext for font rendering from now onshowtext_auto()
4.2.2 Changing the font for each graph
We can set fonts for the entire graph by using the base_family argument. This is particularly useful if you are only generating one important plot and not a document/script filled with them.
There is also the base_size argument to allow us to control the size of the text in our theme and hence apply to other plots we use the theme on, this allows us to overcome the issue that showtext has, so if you are creating one consistent theme layer, it is well worth using these options to set font size and family for all plots that use it.
4.2.2.1 Example
# We specify the base_family font and sizefilter(gapminder, year %in%c(1987)) |>ggplot() +geom_point(mapping =aes(x = gdp_percap,y = life_exp),colour ="#E09F3E") +theme_classic(base_family ="Arial", # NEW - SPECIFYING THE FONT FOR THE WHOLE PLOTbase_size =14) +# NEW - SPECIFYING THE FONT SIZElabs(x ="Gross Domestic Product Per Capita in International Dollars", y ="Life expectancy at birth in years",title ="Graph showing Life Expectancy by GDP Per Capita",subtitle ="Data from Gapminder Dataset",caption ="Source: www.gapminder.org")
4.2.3 Changing the font with Element_text
For better fine tuning we can use element_text() to alter the font and size at the label level (for each individual text element in the plot).
4.2.3.1 Example
# Modifying font, size and adjustment with element_textfilter(gapminder, year %in%c(1987)) |>ggplot() +geom_point(mapping =aes(x = gdp_percap,y = life_exp),colour ="#E09F3E") +theme(plot.title =element_text(family ="Arial", # Set font familysize =16, # Set font sizeface ="bold", # Bold typefacehjust =0, # Horizontal adjustment, left alignvjust =1), # Vertical Adjustment, raises it slightlyplot.subtitle =element_text(family ="Arial", size =14, hjust =0), # Horizontal adjustment, this will left alignplot.caption =element_text(family ="Arial",size =9),axis.title =element_text(family ="Arial", size =12),axis.text =element_text(family ="Arial", size =12 ) ) +# Setting the labelslabs(x ="Gross Domestic Product Per Capita in International Dollars", y ="Life expectancy at birth in years",title ="Graph showing Life Expectancy by GDP Per Capita",subtitle ="Data from Gapminder Dataset",caption ="Source: www.gapminder.org")
From here on in the course, the custom theme will have these elements set as you see below. We will add to this as we learn more about the guidelines in the other sections.
# Updating the theme with the fonts guidancecustom_theme <-theme(# Text elementsplot.title =element_text(family ="Arial", # Set font familysize =16, # Set font sizeface ="bold", # Bold typefacehjust =0, # Horizontal adjustment, Left alignvjust =1), # Vertical Adjustment, raises it slightly# Subtitleplot.subtitle =element_text(family ="Arial", # Font familysize =14, # Font sizehjust =0), # Horizontal adjustment, this will left align# Captionplot.caption =element_text(family ="Arial", # Font familysize =9), # Font size# Axis titlesaxis.title =element_text(family ="Arial", # Font familysize =12), # Font size# Axis textaxis.text =element_text(family ="Arial", # Axis familysize =10,colour ="black"),panel.background =element_blank(), # removes backgroundlegend.key =element_blank(), # make the legend background blankaxis.line.x =element_line(linewidth =0.5, colour ="black"),axis.line.y =element_line(linewidth =0.5, colour ="black") )
Using the visualisation you created in the previous exercise:
Set the font to be “Arial” for the title, x and y axis labels as well as the axis ticks (remember this comes under axis.text).
Set the title to be 16 and the labels and titles for the axes to be 14.
Left align the title and adjust it vertically by 1.
Do this manually with the ‘element_text()’ options.
gapminder_uk <- gapminder |>filter(country %in%"United Kingdom")# Plotting the datagapminder_uk |>ggplot() +geom_point(mapping =aes(x = gdp_percap, y = fertility)) +labs(x ="Gross Domestic Product Per Capita in International Dollars",y ="Fertility (number of children per woman)",title ="Graph showing Fertility by GDP Per Capita" ) +theme(plot.title =element_text(family ="Arial", # Set font familysize =16, # Set font sizeface ="bold", # Bold typefacehjust =0, # Horizontal adjustment, Left alignvjust =1 ), # Vertical Adjustment, raises it slightlyaxis.title =element_text( # Axis titlesfamily ="Arial", size =14 ),axis.text =element_text( # Axis textfamily ="Arial", size =14 ) )
4.3 Axes and Ticks
There are two primary arguments that affect the appearance of the ticks on the axes and the keys on the legend: breaks and labels.
Breaks control the position of the ticks, or the values associated with the keys.
Labels controls the text label associated with each tick/key.
Each break has an associated label, controlled by the labels argument. If you set labels, you must also set breaks; otherwise, if data changes, the breaks will no longer align with the labels.
Of course, the axes we apply these to can be discrete (categorical variable) or continuous (numeric variable). In each of these two cases, the functions to be used for setting axis ticks are different.
Discrete axes:
scale_x_discrete(name, breaks, labels, limits): for x axis
scale_y_discrete(name, breaks, labels, limits): for y axis
Continuous axes:
scale_x_continuous(name, breaks, labels, limits): for x axis
scale_y_continuous(name, breaks, labels, limits): for y axis
These are all added as a layer to the plot with the + sign.
# Break y axis by a specified valuefilter(gapminder, year ==1987) |>ggplot() +geom_point(mapping =aes(x = gdp_percap,y = life_exp),colour ="#E09F3E") + custom_theme +# defined previouslyscale_y_continuous(breaks =seq(from =0,to =90,by =5) ) +# A tick mark is shown on every 5labs(x ="Gross Domestic Product Per Capita in International Dollars", y ="Life expectancy at birth in years",title ="Graph showing Life Expectancy by GDP Per Capita",subtitle ="Data from Gapminder Dataset",caption ="Source: www.gapminder.org")
Guidance states that,
For continuous data axes centrally align labels over tick marks.
For categorical data axes labels should be aligned between tick marks.
You can use more tick marks than labels; ticks indicate the scale or level of detail of the data. Label the final tick if there are more ticks than labels and there is space to do so.
Using the visualisation you created in the previous exercise:
Break the X axis every 5000 using the breaks and limits arguments.
The plot is given below
gapminder_uk <- gapminder |>filter(country %in%"United Kingdom")# Plotting the datagapminder_uk |>ggplot() +geom_point(mapping =aes(x = gdp_percap, y = fertility)) +labs(x ="Gross Domestic Product Per Capita in International Dollars",y ="Fertility (number of children per woman)",title ="Graph showing Fertility by GDP Per Capita" ) +theme(plot.title =element_text(family ="Arial", # Set font familysize =16, # Set font sizeface ="bold", # Bold typefacehjust =0, # Horizontal adjustment, Left alignvjust =1 ), # Vertical Adjustment, raises it slightlyaxis.title =element_text( # Axis titlesfamily ="Arial",size =14 ), # Font sizeaxis.text =element_text( # Axis textfamily ="Arial", size =14 ) )
gapminder_uk <- gapminder |>filter(country %in%"United Kingdom")# Plotting the datagapminder_uk |>ggplot() +geom_point(mapping =aes(x = gdp_percap, y = fertility)) +labs(x ="Gross Domestic Product Per Capita in International Dollars",y ="Fertility (number of children per woman)",title ="Graph showing Fertility by GDP Per Capita" ) +theme(plot.title =element_text(family ="Arial", # Set font familysize =16, # Set font sizeface ="bold", # Bold typefacehjust =0, # Horizontal adjustment, Left alignvjust =1 ), # Vertical Adjustment, raises it slightlyaxis.title =element_text( # Axis titlesfamily ="Arial", size =14 ), # Font sizeaxis.text =element_text( # Axis textfamily ="Arial", size =14 ) ) +scale_x_continuous(breaks =seq(0, max(gapminder_uk$gdp_percap), 5000),limits =c(0, max(gapminder_uk$gdp_percap)))
4.4 Gridlines
AF guidance on gridlines is to use them sparingly. There should usually be between four and eight gridlines per chart as too many makes the plot too cluttered, but too few makes the visualisation harder to read and thus less useful.
There are two types of grid lines,
major grid lines indicating the ticks
minor grid lines between the major ones.
Key ggplot2 theme options to modify the plot panel and background are given below, commented out for use if needed.
# key functions to manage grid lines# panel.grid = element_line(), # All grid lines# panel.grid.major = element_line(), # Alters the format of all major grid lines# panel.grid.minor = element_line(), # Alters the format of all minor grid lines# panel.grid.major.x = element_line(), # Vertical major grid lines# panel.grid.major.y = element_line(), # Horizontal major grid lines# panel.grid.minor.x = element_line(), # Vertical minor grid lines# panel.grid.minor.y = element_line() # Vertical major grid lines
Within element_line() we can specify the colour and thickness of the lines. The colour given below is the recommended grey colour from the guidelines.
4.4.1 Changing gridline colour
# Amending grid linesfilter(gapminder, year ==1987) |>ggplot() +geom_point(mapping =aes(x = gdp_percap,y = life_exp),colour ="#E09F3E") + custom_theme +theme(panel.grid.major =element_line(color ="#D9D9D9")# Adding major grid lines with colour specified ) +scale_y_continuous(breaks =seq(from =0,to =90,by =10)# A tick mark is shown on every 10 ) +labs(x ="Gross Domestic Product Per Capita in International Dollars", # Setting the labelsy ="Life expectancy at birth in years",title ="Graph showing Life Expectancy by GDP Per Capita",subtitle ="Data from Gapminder Dataset",caption ="Source: www.gapminder.org")
4.4.2 Removing axes lines
We could also remove the axes which is generally recommended for publishable quality charts, this way the gridlines line up with the ticks and there is no unneccessary line joining the ticks together. We can do this by using element_blank() we mentioned earlier.
# Removing the axis linesfilter(gapminder, year ==1987) |>ggplot() +geom_point(mapping =aes(x = gdp_percap,y = life_exp),colour ="#E09F3E") + custom_theme +theme(panel.grid.major =element_line(color ="#D9D9D9"), # Adding major grid lines with colour specifiedaxis.line.x =element_blank(),axis.line.y =element_blank() ) +# Remove axis linesscale_y_continuous(breaks =seq(from =0,to =90,by =10) ) +# A tick mark is shown on every 10labs(x ="Gross Domestic Product Per Capita in International Dollars", # Setting the labelsy ="Life expectancy at birth in years",title ="Graph showing Life Expectancy by GDP Per Capita",subtitle ="Data from Gapminder Dataset",caption ="Source: www.gapminder.org")
I will now add to these to the defined theme.
# Updating theme with gridlines guidancecustom_theme <-theme(# Text elementsplot.title =element_text(family ="Arial", # Set font familysize =16, # Set font sizeface ="bold", # Bold typefacehjust =0, # Horizontal adjustment, Left alignvjust =1), # Vertical Adjustment, raises it slightly# Subtitleplot.subtitle =element_text(family ="Arial", # Font familysize =14, # Font sizehjust =0), # Horizontal adjustment, this will left align# Captionplot.caption =element_text(family ="Arial", # Font familysize =9), # Font size# Axis titlesaxis.title =element_text(family ="Arial", # Font familysize =12), # Font size# Axis textaxis.text =element_text(family ="Arial", # Axis familysize =10,colour ="black"),panel.background =element_blank(), # removes backgroundlegend.key =element_blank(), # make the legend background blankpanel.grid.major =element_line(color ="#D9D9D9"), # Adding major grid lines with colour specifiedaxis.line.x =element_blank(),axis.line.y =element_blank() )
Using the visualisation you created in the previous exercise:
Set Gridlines using the ONS colours;
Remove the outer frame (axes)
The plot is given below
gapminder_uk <- gapminder |>filter(country %in%"United Kingdom")# Plotting the datagapminder_uk |>ggplot() +geom_point(mapping =aes(x = gdp_percap, y = fertility)) +labs(x ="Gross Domestic Product Per Capita in International Dollars",y ="Fertility (number of children per woman)",title ="Graph showing Fertility by GDP Per Capita" ) +theme(plot.title =element_text(family ="Arial", # Set font familysize =16, # Set font sizeface ="bold", # Bold typefacehjust =0, # Horizontal adjustment, Left alignvjust =1 ), # Vertical Adjustment, raises it slightlyaxis.title =element_text( # Axis titlesfamily ="Arial", # Font familysize =14 ), # Font sizeaxis.text =element_text( # Axis textfamily ="Arial", # Axis famulysize =14 ) ) +scale_x_continuous(breaks =seq(0, max(gapminder_uk$gdp_percap), 5000),limits =c(0, max(gapminder_uk$gdp_percap)))
gapminder_uk <- gapminder |>filter(country %in%"United Kingdom")# Plotting the datagapminder_uk |>ggplot() +geom_point(mapping =aes(x = gdp_percap, y = fertility)) +labs(x ="Gross Domestic Product Per Capita in International Dollars",y ="Fertility (number of children per woman)",title ="Graph showing Fertility by GDP Per Capita" ) +theme(plot.title =element_text(family ="Arial", # Set font familysize =16, # Set font sizeface ="bold", # Bold typefacehjust =0, # Horizontal adjustment, Left alignvjust =1 ), # Vertical Adjustment, raises it slightlyaxis.title =element_text( # Axis titlesfamily ="Arial", # Font familysize =14 ), # Font sizeaxis.text =element_text( # Axis textfamily ="Arial", # Axis famulysize =14 ),panel.grid.major =element_line(color ="#D9D9D9"), # Adding major grid lines with colour specifiedpanel.grid.minor =element_blank(), # Removing minor grid linespanel.border =element_blank(), # Remove panel borderaxis.line =element_blank() ) +scale_x_continuous(breaks =seq(0, max(gapminder_uk$gdp_percap), 5000),limits =c(0, max(gapminder_uk$gdp_percap)))
4.5 Legends
Guidance on legends and keys state that:
A legend or key should not be used, instead label the data directly. If a legend or key is necessary, place it on the chart as close as possible to the data.
Previously we looked at setting the legend position by specifying location using top, right, bottom or left. We can also put it inside the plot area, we do this by specifying the plot x and y coordinates.
4.5.1 Example
When thinking about the coordinates, x and y are the coordinates of the legend box. Their values should be between 0 and 1. c(0,0) corresponds to the “bottom left” and c(1,1) corresponds to the “top right” position.
filter(gapminder, year ==1987) |>ggplot() +geom_point(mapping =aes(x = gdp_percap,y = life_exp,colour = continent),size =3) +scale_colour_manual(values =c("#335C67", "#F5D000", "#E09F3E", "#9E2A2B", "#540B0E") ) + custom_theme +theme(legend.position =c(x =0.9,y =0.4),legend.title =element_blank(),legend.background =element_blank(),legend.key =element_blank() ) +scale_y_continuous(breaks =seq(from =0,to =90,by =10) ) +# A tick mark is shown on every 10labs(x ="Gross Domestic Product Per Capita in International Dollars", # Setting the labelsy ="Life expectancy at birth in years",title ="Graph showing Life Expectancy by GDP Per Capita",subtitle ="Data from Gapminder Dataset",caption ="Source: www.gapminder.org")
Warning: A numeric `legend.position` argument in `theme()` was deprecated in ggplot2
3.5.0.
ℹ Please use the `legend.position.inside` argument of `theme()` instead.
4.6 Colours
Colour are a fundamental part of our visualisations. Used poorly, they can confuse the user, used well they enhance and clarify statistical content. The guidance from the Analysis Function are very in depth and provide everything you would need to know, but we will summarise some key points here.
Guidance on colours suggests the following:
Limit the number of colours you use: Think before you introduce new colours to the graph –> Do they make the message clearer?
Use colour consistently: If you have a series of charts, ensure the same colour is assigned to the corresponding variable in each chart.
Consider colour associations: People often like colours to certain things unconsciously (such as green with grass) and they may even have cultural associations as well. Guidance on this from ThoughtCo and ONS can help clarify this.
Use a plain background: Usually white is the way to go.
Consider colour contrast ratios: This is incredibly important for accessibility as low contrast can make plot elements virtually impossible to see. Guidance from WebAIM and the Accessibility Developer Guide will help massively with this. There is a handy contrast checker from the excellent WebAIM developers is a useful tool.
Consider others: There is also the need to acknowledge those with low vision percentages as well as those that are colour blind when making accessible visualisations as all can be equally impacted by bad chart design.
Grayscale: See if your colour palette can be understood in Grayscale, because this allows you to check that your chart to be understood without colour, as this is a specific type of colour blindness.
Format: Use the SVG format for plots as things are easier to see.
The colour palettes we use at the ONS alongside their HEX codes, RGB codes and CYMK code are in the guidance pages, implementing these just requires we change the colour argument in the functions we use.
Some useful links for colour in visualisation and go alongside the AF guidance:
Absolutely fabulous following of the AF guidelines! This is incredibly important for the accessibility of the visualisations we create and ensuring we adhere to the promises we make to have the public good in mind at all times, should they need assistance or not.
Next up is the plot types chapter, where we will see a variety of different Geometric Layers, how to utilise them and how to customise them to AF guidelines. Chapter 4 will be the last chapter of the CORE material for this course.