Chapter 2 - Introducing ggplot2

Government Analysis Function and Data Science Campus Logos.

1 Overview of ggplot

ggplot2 is a very powerful data visualisation tool that is great for exploring data and producing publication quality figures. As mentioned in the Chapter One of the Introduction to R Course, the BBC use R to produce their graphics.

The package is based upon theory presented in the book The Grammar of Graphics written by Leland Wilkinson, where a set of rules for constructing statistical graphics by combining different types of layers is defined. This is why I like the cake allegory for the inner mechanism of the package, as it allows you to visualise in your head the elements of a plot and how they come together.

As such, the gg in ggplot2 stands for grammar of graphics, which is a way of thinking about plotting as having grammar elements that can be applied in succession to create a plot. This is reproducible in that every graph can be built from the same few components, which themselves can be tweaked for each specific purpose. Comparing this to cakes, much of the time we have the same base ingredients for the sponge, but the ratio of these, the subsequent decoration and embellishments often differ between cake types.

In this section we’ll explore creating a basic plot using ggplot. We’ll do this with just one kind of plot, the perfect place to start for anyone plotting for the first time, the scatter plot. By the end of this section we’ll have built our scatter plot to meet AF standards and have a good core understanding of how ggplot works.

2 Basic Foundation

Each ggplot will have three basic elements of ggplot these are:

  • The data, the dataset containing the variables of interest.

  • Geometric layers, the shape of our visualisation (the plot type itself).

  • Aesthetics, which are visual properties of the objects in your plot. They include things like the size, the shape, or the color of your points.

We mentioned that we will be plotting scatter plots to build up knowledge of the package. These have points that show the relationship between two sets of data (if you are familiar with statistics we often use these to diagnose correlations).

2.1 Packages

The first thing we always do at the start of an analysis is load in our packages. In this chapter we will use the following

# Loading Packages

library(tidyverse) # Contains ggplot2, tidyr, dplyr and readr which we require
library(janitor) # To clean column names
library(scales) # Methods for automatically determining labels for axes and legends
library(patchwork) # Used to combine multiple plots
library(RColorBrewer) # Colour palettes

2.2 The Data

First lets look at our data; it is vital to know the data types within our dataset as the plot types available to us differ as a result.

The data must be in a tidy data frame. This is one of the challenges of performing robust data visualisation, you are required to get the data in a tidy format. Tidy data frames are described in more detail in the Introduction to R course, which is a pre-requisite for this one. Feel free to review the content of Chapter 4 - Working with DataFrames if you need a reminder, but for now, all you need to know is that a tidy data frame has variables in the columns and observations in the rows.

We will use the gapminder dataset throughout this course, which is collected by the gapminder foundation and is well known for being an excellent teaching tool.

# Importing data using the read_csv function 

gapminder <- read_csv("./data/gapminder.csv")

# Cleaning column names and dropping missing values

gapminder <- clean_names(gapminder) |>
             drop_na()

gapminder |> glimpse()
Rows: 1,224
Columns: 8
$ country          <chr> "Albania", "Albania", "Albania", "Albania", "Albania"…
$ continent        <chr> "Europe", "Europe", "Europe", "Europe", "Europe", "Eu…
$ year             <dbl> 1962, 1967, 1972, 1977, 1982, 1987, 1992, 1997, 2002,…
$ life_exp         <dbl> 64.820, 66.220, 67.690, 68.930, 70.420, 72.000, 71.58…
$ pop              <dbl> 1728137, 1984060, 2263554, 2509048, 2780097, 3075321,…
$ gdp_percap       <dbl> 2312.889, 2760.197, 3313.422, 3533.004, 3630.881, 373…
$ infant_mortality <dbl> 106.5, 86.8, 71.0, 58.6, 56.1, 40.8, 32.5, 26.8, 21.0…
$ fertility        <dbl> 5.96, 5.38, 4.81, 4.09, 3.46, 3.13, 2.87, 2.61, 2.20,…

3 The Data Layer

Lets look at the basic syntax to plot using ggplot:

  • starts with “ggplot()”

  • then we specify the data, this is our first layer, known as the data layer.

3.1 Example

Let’s see the relationship between the gdp_percap and life_exp for Brazil. In order to do this we need a new DataFrame with just the data for Brazil, let’s call it “gm_brazil”. We will use the pipe operator introduced in the pre-requisite course to chain commands together.

As a reminder data manipulation is not explained in this course, but can be found in the Introduction to R course.

# Filtering the gapminder data

gm_brazil <- gapminder |>
             filter(country == "Brazil")


# Note this will provide an empty graph

ggplot(data = gm_brazil)

This will plot the base of our visualisation, but as we have not specified what kind of graph we want, we only get a blank canvas. The function ggplot() creates a coordinate system that you can add layers to (we have axes, but nothing plotted on them and no ranges assigned to them).

To visualise the relationship between life_exp and gdpPercap, we need to add a geometric layer.

4 Geometric layers

4.1 The syntax of adding layers

In ggplot2 we create graphs by adding layers. Layers can define geometries, compute summary statistics, define what scales to use, or even change styles. In order to add layers, we use the symbol “+”, as opposed to the pipe operator which you may have expected to see. The pipe is used to chain operations on a data structure together, whereas adding layers to a plot is a process of addition, the data is not passed to these processes, it is just used for the data layer, as such we use the plus symbol.

Geometric layers are the shapes that represent the data, whose names begin with geom_. They follow the naming pattern “geom_X”, where “X” is the name of the geometry (or plot type) we are interested in. For example we have:

  • geom_point (Graph of points or a scatter plot)
  • geom_bar (Bar graph)
  • geom_histogram (A histogram)

and many more that we will see in Chapter 4.

In the console, type geom (don’t run it) and R will attempt to autocomplete with the list of options for you to choose from.

# Type the code below to see the list of geometric layers

geom

We can also see them on the ggplot cheat sheet, which is an incredibly useful reference piece once you have completed the core content of this course. We highly recommend bookmarking this as well as the other cheat sheets provided by R Studio.

4.2 Example

Looking back at our code so far we have the function ggplot() and then we specified our data gm_brazil, this is our first layer. Now we can add a ‘layer’ to the plot using one of the “+ geom_()” methods to define the shape of the graph.

To see the relationship between the life_exp and gpd_percap we can use geom_point() to plot a scatter plot. It is recommended that you enter a new line after using the plus sign, like you would with the pipe operator to improve the readability of your code.

# Note this code will give an error

ggplot(data = gm_brazil) + # Plus sign to add another layer
       geom_point()

Other layers we can add to a plot include the plot title, axes labels, and visual themes for the plots. We will look at some of these later in the course. They are stacked in the order that the code is written, which can create subtle differences in the final product dependent on their placement in some cases.

If you run the code above you will have got an error

Error: geom_point requires the following missing aesthetics: x, y

We still haven’t got a graph yet as we haven’t specified which parts of the data we would like to plot, which bring us on the aesthetics.

5 Aesthetics

An aesthetic is a visual property of the objects in your plot. Aesthetics include things like x/y position, the size, the shape, or the color of your points. In this case we first need to specify what we want to be displayed on the x and y axis.

Aesthetic attributes are mapped to variables in the dataset, we do this by adding the aes() function inside our geom_point() function.

5.1 Example

In the following, the aes() statement below tells R that we want to set gdp_percap to the x axis, and life_exp to the y axis.

# Mapping aesthetics

ggplot(data = gm_brazil) +
       geom_point(mapping = aes(x = gdp_percap,
                                y = life_exp)) # Assigned the aesthetics to 'mapping' a very common variable name in this field

5.1.1 Exercise

  1. Filter data for United Kingdom and call it gapminder_uk.

  2. Make a ggplot scatter plot with:

  • gdp_percap as the x axis

  • fertility as the y axis

ggplot(data = Your data) +
      <geom_function>(mapping = aes(<mappings>))
# (a)
gapminder_uk <- gapminder |>
                  filter(country == "United Kingdom")

# (b)

ggplot(data = gapminder_uk) +
       geom_point(mapping = aes(x = gdp_percap,
                                y = fertility))

5.2 Aesthetic Mappings

Previously we stated that we can change other elements of the plot like colour and size. In this section we will explore how to do this in practice.

Everything inside aes() will have a scale, if none is provided it will get a default. Different types of aesthetic attributes work better with different types of variables. For example, colour and shape works well with categorical variables (as we get one per category), while size works well for continuous variables (as it is on a numerical scale).

Note the mapping requirements differ with the different geometries, which we will see examples of later in the course.

5.2.1 Colour

ggplot2 allows us to customise the colours of plots using its fill and color arguments. We focus on colour for now and explore fill later on.

The explanatory text will spell colour with the UK spelling. ggplot accepts both the American spelling for the argument: color= and the UK spelling: colour=

You can set the aesthetic properties of your geom manually. This sets the chosen aesthetics globally within the graph.

Let’s look at assigning by name first.

Below is a graph showing the relationship between the gdp_percap variable and life_exp.

Remember that You can pipe data into ggplot (and combine with data manipulation functions) but within ggplot you need to ADD LAYERS with +.

# Piping data into the graph 

filter(gapminder, year == 1987) |> # Data piped into
       ggplot() + # ggplot function initiating plot
       geom_point(mapping = aes(x = gdp_percap, 
                                y = life_exp))

We can set a colour to geom_point. For example, we can make all of the points in our plot blue by specifying the colour argument of geom_point() to the name of a colour as a character string.

If you want all data points to be the same colour, you would define colour = “blue” outside the aes() function. Placing this inside the aes() function means something different which we will cover later.

# Colour specified using a colour name

filter(gapminder, year == 1987) |>
       ggplot() +
       geom_point(mapping = aes(x = gdp_percap,
                                y = life_exp),
                  colour = "blue") # Colour specified outside aes function

We can see a complete list of the available choices here: Colours in R.

Note that you can also see a complete list of the 657 colours typing colors().

# Returns the built-in colour names

colours()

We can also specify colour by HTML Names and HEX codes, ggplot will accept most common names, and HTML colour names - here I’m using colour = “OliveDrab”

# Colour specified using HTML colour names

filter(gapminder, year == 1987) |>
       ggplot() +
       geom_point(mapping = aes(x = gdp_percap, 
                                y = life_exp),
                  colour = "OliveDrab") # Colour specified outside aes function

For finer control we can also use hex codes, which can define all colours. A hex code looks like this: #9E2A2B; it is given as a string with a # symbol at the front. This particular code here is a burnt red colour.

You can find hex codes here HTML colour codes.

# Colour specified using HTML HEX code

filter(gapminder, year == 1987) |>
       ggplot() +
       geom_point(mapping = aes(x = gdp_percap,
                                y = life_exp),
                  colour = "#9E2A2B") # Colour specified outside the aes function

We can also set colours by the RGB value. This is the Red, Green and Blue value, which can create all colours by combining various scales of each colour. This is on a scale of 0 to 255 for each value, as an example, the ONS Blue has the RGB value (0, 61, 89).

A colour can be specified using R’s “rgb()” function that takes three arguments: red, green, and blue (which, by default, all have a range of [0, 1]).

# Colour specified using RGB values

filter(gapminder, year == 1987) |>
       ggplot() +
       geom_point(mapping = aes(x = gdp_percap, 
                                y = life_exp),
                  colour = rgb(0, 61, 89, 
                               maxColorValue = 255)) # Colour specified outside aes function

In order to use colour effectively with your data, most importantly you need to know if you are dealing with a categorical or continuous variables.

5.2.2 Mapping the Colour

You can use the different aesthetics to convey information, by mapping the aesthetics in your plot to the variables in your dataset. To map an aesthetic to a variable, associate the name of the aesthetic to the name of the variable inside the aes() function, like we did with x and y.

ggplot2 will automatically assign a unique level of the aesthetic to each unique value of the variable. It will even add a legend that explains which levels correspond to which values.

5.2.2.1 Example

For example, you can map the colours of your points to the continent.

# Mapping the colour of our data points

filter(gapminder, year == 1987) |>
       ggplot() +
       geom_point(mapping = aes(x = gdp_percap,
                                y = life_exp,
                                colour = continent)) # Colour specified within aes

We can also amend these colours manually, using the “scale_color_manual()” function. This is added on as a layer to our plot.

# Here we are specifying our colour values manually

filter(gapminder, year == 1987) |>
       ggplot() +
       geom_point(mapping = aes(x = gdp_percap,
                                y = life_exp,
                                color = continent)
                  ) +
      scale_colour_manual(values = c("Africa" = "blue", # Added as a layer
                                     "Americas" = "red",
                                     "Asia" = "green",
                                     "Europe" = "yellow",
                                     "Oceania" = "grey"))

5.2.3 Size

We can set the sizes of the points within our scatter plot.

In the example below, we set every point to be the same size of 3, which the diameter in mm.

# Changing the size of our points manually

filter(gapminder, year == 1987) |>
       ggplot() +
       geom_point(mapping = aes(x = gdp_percap, 
                                y = life_exp),
                  color = "red", # Colour specified outside aes function
                  size = 3) # size specified outside aes function

We can also map variables to size in the same way.

# Mapping the size and colour of our data points

filter(gapminder, year == 1987) |>
       ggplot() +
       geom_point(mapping = aes(x = gdp_percap,
                                y = life_exp,
                                color = continent,
                                size = pop)) # size and colour specified within aes function

Note that R will deal with large numbers using scientific notation. In the legend for Pop we can see 2.5e+08 which denotes to 2.5 × 10^8 = 250,000,000 = 250 million.

You can turn off scientific notation by specifying “scipen” within the “options” functions.

# Turn off scientific notation

options(scipen = 999)

However, if this is not to your liking and you’d rather the standard form output, you can reverse this by using the following:

  • options(scipen = 0)

Where we are effectively reverting scipen to its default value of 0.

Like with colours, We can also amend these sizes manually, by specifying which colours we want using the “scale_size_manual” function which functions similarly to “scale_colour_manual” and would also be added a layer.

5.2.4 Shape

R has built in shapes that are identified by numbers. There are some seeming duplicates: for example, 0, 15, and 22 are all squares. The difference comes from the interaction of the colour and fill aesthetics.

  • The hollow shapes (0–14) have a border determined by colour;
  • The solid shapes (15–18) have a border and are filled with colour;
  • The filled shapes (21–24) have a border of colour but are filled with fill.

We can see list of all the available shapes, below.

These are also available on the Cheat sheet.

# Changing the shape of the points


filter(gapminder, year == 1987) |>
       ggplot() +
       geom_point(mapping = aes(x = gdp_percap, 
                                y = life_exp),
                  colour = "navy", # Colour specified outside aes function
                  size = 3, # Size specified outside aes function
                  shape = 17) # Shape specified outside aes function

We can also map variables to shape in the same way as we did with colour and size.

# Mapping the shape of our data points

filter(gapminder, year %in% c(1987, 2007)) |>
       ggplot() +
       geom_point(mapping = aes(x = gdp_percap,
                                y = life_exp,
                                shape = continent,
                                colour = as.factor(year)))

You may have noticed a sneaky use of “as.factor()” here, this is because in the dataset, the year column is a numeric containing integer values of the years. To use this as a way to categorise our colours, we must convert it to a categorical variable, using the as.factor() function. This is very useful when you have a numeric (particularly a small number of integers) column you want to use as a mapper in this way.

For more details on factors there is good tutorial here, Understanding Factors.

Note that ggplot2 will only use six shapes at a time. By default, additional groups will go unplotted when you use the shape aesthetic.

We can also amend these shapes manually, using the shapes we can specify which shapes we want using the “scale_shape_manual()” function as another layer to the plot.

# Specifying shapes using scale shape manual

filter(gapminder, year %in% c(1987, 2007)) |>
       ggplot() +
       geom_point(mapping = aes(x = gdp_percap,
                                y = life_exp,
                                shape = continent,
                                color = as.factor(year))) +
       scale_shape_manual(values = c(15, 16, 17, 1, 11))

5.2.5 Transparency

We can set the transparency of our points, using the parameter “alpha”. Alpha refers to the opacity of a point, values of which range from 0 to 1, with lower values corresponding to more transparent colours.

# Changing the transparency of the points

filter(gapminder, year == 1987) |>
       ggplot() +
       geom_point(mapping = aes(x = gdp_percap,
                                y = life_exp),
                  color = "navy", # Colour specified outside aes function
                  size = 4, # Size specified outside aes function
                  alpha = 0.4) # Alpha specified outside aes function

We can also map variables to alpha in the same way as before.

# Mapping the alpha of our data points

filter(gapminder, year == 1987) |>
       ggplot() +
       geom_point(mapping = aes(x = gdp_percap,
                                y = life_exp,
                                color = continent,
                                alpha = infant_mortality)) +
      scale_colour_manual(values = c("Africa" = "blue",
                                     "Americas" = "red",
                                     "Asia" = "green",
                                     "Europe" = "yellow",
                                     "Oceania" = "grey"))

We can also make use of the “scale_alpha_manual” function to specifically specify what alpha values we want for each level of the variable, like we have with colour above.

5.2.6 Exercises

  1. What happens if you map an aesthetic to something other than a variable name, like aes(color = life_exp > 65)?
# Condition in the mapping?

filter(gapminder, year == 1987) |>
       ggplot() +
       geom_point(mapping = aes(x = fertility,
                                y = life_exp,
                                colour = life_exp > 65))
  1. Suppose that instead of indicating continent using colour, you wanted all the points in the plot below to be blue, how would do it?
# Filtering the gapminder data
# Plotting the data

filter(gapminder, year == 1987) |>
  ggplot() +
  geom_point(mapping = aes(x = gdp_percap,
                           y = life_exp,
                           colour = continent))

  1. What happens if you map an aesthetic to something other than a variable name, like aes(color = life_exp > 65)?
# Mapping color to an expression

filter(gapminder, year == 1987) |>
  ggplot() +
  geom_point(mapping = aes(x = fertility,
                           y = life_exp,
                           colour = life_exp > 65))

Aesthetics can be mapped to expressions like “colour = life_exp > 65”. The ggplot() function behaves as if a temporary variable was added to the data with with values equal to the result of the expression. In this case, the result of “color = life_exp > 65” is a logical variable which takes values of TRUE or FALSE.

  1. Suppose that instead of indicating continent using colour, you wanted all the points in the plot below to be blue, how would do it?

When you want color to be a variable from your dataset, put “colour = ” inside of aes; when you simply want to set the colours of all the points, put `color = “” outside of the aes() function.

# Setting the color to blue

filter(gapminder, year == 1987) |>
  ggplot() +
  geom_point(mapping = aes(x = gdp_percap,
                           y = life_exp),
             colour = "blue")

5.3 Colour Palettes

We can choose specific colour palettes, such as those provided by the “RColorBrewer” package for the aesthetics in our plots. These colours have been designed to work well in a wide variety of situations. The package provides palettes for different types of scale (sequential, diverging, qualitative). You will need to install and load this package to use it as it is not part of the tidyverse.

  • sequential - great for low-to-high situations where one extreme is exciting and the other is boring

  • qualitative - great for things that range from “extreme and negative” to “extreme and positive”

  • diverging - great for non-ordered categorical things – such as your typical factor, like country or continent

# Displaying the colour palettes we have

display.brewer.all()

5.3.1 Example

We can add the palette as a layer as shown in the example below,

# Adding a colour palette, this is added as layer.


filter(gapminder, year == 1987) |>
       ggplot() +
       geom_point(mapping = aes(x = gdp_percap,
                                y = life_exp,
                                colour = continent),
                  size = 3) +
       scale_color_brewer(type = diverging, # Palette added as a layer
                          palette = "RdYlBu")

5.4 Adding titles and labels

Good labels are critical for making your plots accessible to a wider audience. Always ensure the axis and legend labels are fully descriptive. When adding a title and more meaningful labels, it’s always a good idea to replace short variable names with more detailed descriptions, and to include the units. This can be done using the “labs()” function.

Within this function you can add several arguments, e.g. adding in a subtitle and a caption.

  • subtitle - adds additional detail in a smaller font beneath the title

  • caption - adds text at the bottom right of the plot, often used to describe the source of the data

5.4.1 Example 1 - Adding labels

# Using the labs function to rename x and y axes, add title, subtitle and caption

filter(gapminder, year == 1987) |>
       ggplot() +
       geom_point(mapping = aes(x = gdp_percap,
                                y = life_exp,
                                colour = continent),
                                size = 3) +
       labs(x = "Gross Domestic Product Per Capita in International Dollars", # Labels
            y = "Life expectancy at birth in years",
            title = "Graph showing Life Expectancy by GDP Per Capita",
            subtitle = "Data from Gapminder Dataset",
            caption = "www.gapminder.org")

5.4.2 Example 2 - Change the labels on a legend

We can also change the labels of our legend within the labs() function.

# Changing the legend label in the labs function

filter(gapminder, year %in% 2007) |>
       ggplot() +
       geom_point(mapping = aes(x = gdp_percap,
                                y = life_exp,
                                colour = continent),
                                size = 3) +
       labs(x = "Gross Domestic Product Per Capita in International Dollars",
            y = "life expectancy at birth in years",
            color = "Continent", # Legend Label
            title = "Graph showing Life Expectancy by GDP Per Capita",
            subtitle = "Data from Gapminder Dataset",
            caption = "www.gapminder.org")

5.4.3 Exercise

  1. Using the visualisation you created in the previous exercise:
  • Set an appropriate Title and X and Y axis Label.
gapminder_uk <- gapminder |>
                  filter(country %in% "United Kingdom")

# Plotting the data

gapminder_uk |> ggplot() +
       geom_point(mapping = aes(x = gdp_percap,
                                y = fertility))

gapminder_uk <- gapminder |>
                  filter(country %in% "United Kingdom")

# Plotting the data

gapminder_uk |> ggplot() +
       geom_point(mapping = aes(x = gdp_percap, 
                                y = fertility)) +
       labs() # To add lables and titles
gapminder_uk <- gapminder |>
                  filter(country %in% "United Kingdom")

# Plotting the data

gapminder_uk |> ggplot() +
       geom_point(mapping = aes(x = gdp_percap, 
                                y = fertility)) +
       labs(x = "Gross Domestic Product Per Capita in International Dollars",
            y = "Fertility (number of children per woman)",
            title = "Graph showing Fertility by GDP Per Capita")

5.5 Changing the limits of our axes

There are two reasons you might want to specify limits rather than relying on ggplot to set them for us;

  • You want to shrink the limits to focus on an interesting area of the plot.

  • You want to expand the limits to make multiple plots match up.

The functions we use are xlim() and ylim(), which modify the limits of axes. These are added as another layer to the plot as follows

5.5.1 Examples

# Changing the limit of the y axis using the ylim function


filter(gapminder, year %in% 2007) |>
       ggplot() +
       geom_point(mapping = aes(x = gdp_percap,
                                y = life_exp),
                                size = 3) +
       labs(x = "Gross Domestic Product Per Capita in International Dollars",
            y = "Life expectancy at birth in years",
            title = "Graph showing Life Expectancy by GDP Per Capita",
            subtitle = "Data from Gapminder Dataset",
            caption = "www.gapminder.org") +
       ylim(0, 85) # Amend axis limit, lower to 0, upper to 85

Note : it is possible to specify only the lower or upper bound of a limit. For instance, try “ylim(0,NA)” and observe the results.

Alternatively you could use the function coord_cartesian(), which takes the arguments xlim and ylim, allowing you to zoom on specific regions of the plot.

# Changing the limit of the y axis using the coord_cartesian function

filter(gapminder, year %in% 2007) |>
       ggplot() +
       geom_point(mapping = aes(x = gdp_percap,
                                y = life_exp),
                  size = 3) +
       labs( x = "Gross Domestic Product Per Capita in International Dollars",
             y = "Life expectancy at birth in years",
             title = "Graph showing Life Expectancy by GDP Per Capita",
             subtitle = "Data from Gapminder Dataset",
             caption = "www.gapminder.org") +
       coord_cartesian(ylim = c(24, 83)) # Zooming to  24 - 83

You can also force ggplot to plot the graph starting from origin. Using expand_limits(), this is also added on as a layer.

# Changing the limit of the y axis using the expand limits function

filter(gapminder, year %in% 2007) |>
  ggplot() +
  geom_point(mapping = aes(x = gdp_percap,
                           y = life_exp),
             size = 3) +
  labs(
    x = "Gross Domestic Product Per Capita in International Dollars",
    y = "Life expectancy at birth in years",
    color = "Year",
    title = "Graph showing Life Expectancy by GDP Per Capita",
    subtitle = "Data from Gapminder Dataset",
    caption = "www.gapminder.org") +
  expand_limits(x = 0, y = 0) # Expand limits

It’s important to note that ggplot automatically makes the decision of the scale of my axes.

We’ll cover this in more depth when we revisit scatter plots; but compare the two visualisations below. The left uses the automatically calculated axes. From a quick glance the correlations look different, depending on how and where the axis starts. We will explain how to create multiple plots side by side later.

5.6 Annotation

In a prior section we looked at adding some labels to our axes, we can also add labels into the plot itself (label each point and line for example). Most plots will not benefit from adding text to every single observation on the plot, but labelling outliers and other points of interest can be really useful.

We do this by adding

  • geom_text() - which adds label text at the specified x and y positions.

  • geom_label() - draws a rectangle behind the text, making it easier to read

  • annotate() - useful for adding small annotations (such as text labels).

5.6.1 Examples

geom_text() has the most aesthetics of any geom, because there are so many ways to control the appearance of a text.

# geom text adds the label

# Data Manipulation
example_data <- gapminder |>
                filter(year == "2007" & life_exp > 82)


# Plotting the data
filter(gapminder, year %in% 2007) |>
  ggplot() +
  geom_point(mapping = aes(x = gdp_percap,
                           y = life_exp,
                           colour = continent),
                            size = 3) +
  geom_text(data = example_data, # New element
            mapping = aes(x = gdp_percap,
                          y = life_exp,
                          label = country),
            colour = "black")

Using geom label adds a rectangular box to make it easier to see the label.

# geom label adds the label

# Data Manipulation
example_data <- gapminder |>
                filter(year == "2007" & life_exp > 82)


# Plotting the data
filter(gapminder, year %in% 2007) |>
  ggplot() +
  geom_point(mapping = aes(x = gdp_percap,
                           y = life_exp,
                           colour = continent),
             size = 3) +
  geom_label(data = example_data, # New element - Added as a layer
             mapping = aes(x = gdp_percap,
                           y = life_exp,
                           label = country),
             colour = "black")

We can also annotate the graph, which is also added as layer.

# We can specify x and y coordinates for the annotation

# plotting the data
filter(gapminder, year %in% 2007) |>
  ggplot() +
  geom_point(mapping = aes(x = gdp_percap,
                           y = life_exp,
                           colour = continent),
             size = 3) +
  annotate(geom = "text", 
           x = 30000,
           y = 65, 
           label = "This is my annotation at x = 30000 and y = 65") # this annotates the graph

We can also change the colour of the text.

# We can change the colour of our annotation

filter(gapminder, year %in% 2007) |>
  ggplot() +
  geom_point(mapping = aes(x = gdp_percap,
                           y = life_exp,
                           colour = continent),
             size = 3 ) +
  annotate(geom = "text", 
           x = 30000, 
           y = 65, 
           label = "These are the Countries with the highest life expectancy",
           colour = "red") # we can change the colour

5.7 Adding lines

Horizontal and vertical lines can be added to our plots, allowing us to highlight/group particular regions or highlight a not so obvious pattern. We do this using:

  • geom_hline(yintercept = a) - Horizontal line at the y intercept (value of y) provided.

  • geom_vline(xintercept = b) - Vertical line at the x intercept (value of x) provided.

5.7.1 Example - Marking the mean

This trick is often used to include the average of particular values into a plot, so we can see deviations from the mean, median etc. In order to compute these we use the “mean()” function like we did back in Intro to R, feel free to review the Summary Statistics and Aggregation chapter if you’re stuck.

# Adding a horizontal line

# Data Manipulation
example_data <- gapminder |>
                filter(year == "2007")

# Plotting the data
ggplot(data = example_data) +
  geom_point(mapping = aes(x = gdp_percap,
                           y = life_exp,
                           colour = continent),
             size = 3) +
  geom_hline(yintercept = mean(example_data$life_exp), # horizontal line at the mean
             colour = "red")

We can also label the line using annotate to let people know what it means.

# Annotating our line

# Plotting the data
ggplot(data = example_data) +
  geom_point(mapping = aes(x = gdp_percap,
                           y = life_exp,
                           colour = continent),
             size = 3) +
  geom_hline(yintercept = mean(example_data$life_exp),
             colour = "black",
             linewidth = 1) +
  annotate(geom = "text", 
           x = 0, 
           y = 68, 
           label = "Mean") # adds a label to the line

5.8 Setting the Theme

5.8.1 Using a Default Theme

We can also modify the overall theme of the plot; which changes the styling, be it colours, fonts, backgrounds and so on. When creating the plot you determine how the data is displayed, then after it has been created you can edit every detail of the rendering, using the system of themes that are available to us.

By default we get theme_grey(), the signature ggplot2 theme with a light grey background and white grid lines. The theme is designed to put the data forward while supporting comparisons. However, ggplot2 includes seven themes by default, and you can add more by installing packages, e.g ggthemes.

The theme alterations are added on as an extra layer with the + sign at the end, known as the theme layer. For example, we can change the default greyish background by adding a new theme, you can see all the the options by typing “theme_” and observing the autofill.

You can also create your own personalised themes and assign them as variables, if you are trying to match a particular corporate style and each plot must adhere to the same guidelines.

# Adding a new theme as a layer

filter(gapminder, year %in% 2007) |>
  ggplot() +
  geom_point(mapping = aes(x = gdp_percap,
                           y = life_exp,
                           colour = continent),
             size = 3) +
  labs(x = "Gross Domestic Product Per Capita in International Dollars",
       y = "Life expectancy at birth in years",
       colour = "Year",
       title = "Graph showing Life Expectancy by GDP Per Capita",
       subtitle = "Data from Gapminder Dataset",
       caption = "www.gapminder.org") +
  theme_bw() # Added new theme

5.8.2 Changing Existing Themes

The existing themes are a great place to start but don’t give you a lot of control. To modify individual elements, you need to use the theme() function to override the default setting for an element with an “element_” function, thus creating our own theme to apply to our plots.

We can amend a large number of things like fonts, font sizes, axis ticks, legend position etc, which go a long way in making our plots more accessible to users, we will see alot of this in Chapters 3 and 4. This is quite an overwhelming new section so it is recommended you explore the documentation and bookmark it for future use, lest we get lost in the sea of different options available to us.

Every single component of a ggplot graph can be customized. For more details on what you can amend have a look at Modify components of a theme. There are four basic built-in element functions, which specific elements can be changed using:

  • text - element_text(), draws labels and headings. You can control the font family, face, colour, size and justification.
  • lines - element_line(), draws lines, here you can control the colour, size and linetype.
  • rectangles - element_rect(), draws rectangles, mostly used for backgrounds, here you can change the fill, colour and border colour, size and linetype.
  • blank - element_blank(), draws nothing. Use this if you don’t want a specific element to be included in the plot (this provides a useful eraser if the plot theme you are using is close to what you want but has one or two pesky elements you don’t need).

There are around 40 unique elements that control the appearance of the plot. For more information run vignette(“ggplot2-specs”) in the console.

The ggplot2 theme elements and how each can be used.

The ggplot2 Theme Elements

Now for some examples.

For example we can add a custom theme to our graph below,

# Modifying the plot elements
# Theme as added as a layer

filter(gapminder, year %in% 2007) |>
  ggplot() +
  geom_point(mapping = aes(x = gdp_percap,
                           y = life_exp,
                           colour = continent),
             size = 3) +
  labs(x = "Gross Domestic Product Per Capita in International Dollars",
       y = "Life expectancy at birth in years",
       colour = "Year",
       title = "Graph showing Life Expectancy by GDP Per Capita",
       subtitle = "Data from Gapminder Dataset",
       caption = "www.gapminder.org") +
  # Added theme to modify plot
  theme(plot.background = element_rect(fill = "slategray3", # Colour of background 
                                       colour = "black", # Colour and size of background border
                                       linewidth = 2),   
        plot.title = element_text(colour = "red",  # Colour of title
                                  face = "bold"),  # Font of title
        plot.margin = margin(t = 20,    # Margin of plot
                             r = 20,
                             b = 20,
                             l = 20,
                             unit = "pt"))

The example below modifies some of the elements related to the axes.

# Modifying the axis elements
# Theme as added as a layer

filter(gapminder, year %in% 2007) |>
  ggplot() +
  geom_point(mapping = aes(x = gdp_percap,
                           y = life_exp,
                           colour = continent),
             size = 3) +
  labs(x = "Gross Domestic Product Per Capita in International Dollars",
       y = "Life expectancy at birth in years",
       colour = "Year",
       title = "Graph showing Life Expectancy by GDP Per Capita",
       subtitle = "Data from Gapminder Dataset",
       caption = "www.gapminder.org") +
  # Added theme to modify plot
  theme(axis.text = element_text(colour = "red"), # Colour of axis text
        axis.title = element_text(face = "bold", colour = "red"), # Colour of axis title
        axis.ticks = element_line(colour = "green", linewidth = 4),   # Colour and size of axis ticks
        axis.line = element_line(colour = "orange",linewidth = 2))    # Colour of axis line

A legend can display multiple aesthetics (e.g. colour and shape), from multiple layers, and the symbol displayed in a legend varies based on the geom used in the layer.

# Modifying the legend elements
# Theme as added as a layer

filter(gapminder, year %in% 2007) |>
  ggplot() +
  geom_point(mapping = aes(x = gdp_percap,
                           y = life_exp,
                           colour = continent),
             size = 3) +
  labs(x = "Gross Domestic Product Per Capita in International Dollars",
       y = "Life expectancy at birth in years",
       colour = "Year",
       title = "Graph showing Life Expectancy by GDP Per Capita",
       subtitle = "Data from Gapminder Dataset",
       caption = "www.gapminder.org") +
  # Added theme to modify plot
  theme(legend.background = element_rect(fill = "deepskyblue2"), # Change fill of legend with rectangle
        legend.title = element_text(colour = "white", # Change the legend Title
                                    face = "bold"),
        legend.text = element_text(colour = "blue"), # Change the legend text 
        legend.margin = margin(t = 10,             # Legend Margin
                               r = 10,
                               b = 10,
                               l = 10,
                               unit = "pt"))

We can turn off the legend title, by adding “element.title = element_blank()”. Make sure the element you are removing isn’t necessary to understand the plot.

# Modifying the legend elements
# Turning off legend title

filter(gapminder, year %in% 2007) |>
  ggplot() +
  geom_point(mapping = aes(x = gdp_percap,
                           y = life_exp,
                           colour = continent),
             size = 3) +
  labs(x = "Gross Domestic Product Per Capita in International Dollars",
       y = "Life expectancy at birth in years",
       colour = "Year",
       title = "Graph showing Life Expectancy by GDP Per Capita",
       subtitle = "Data from Gapminder Dataset",
       caption = "www.gapminder.org") +
  theme(legend.title = element_blank()) # Removing the legend title with blank

Legends can appear in different places, so you need some global way of controlling them. They have considerably more details that can be tweaked:

  • Should they be displayed vertically or horizontally?
  • How many columns?
  • How big should the keys be?

The position and justification of legends are controlled by the theme setting “legend.position”, which takes values “right”, “left”, “top”, “bottom”, or “none” (no legend).

# Modifying the legend elements
# Modifying legend position

filter(gapminder, year %in% 2007) |>
  ggplot() +
  geom_point(mapping = aes(x = gdp_percap,
                           y = life_exp,
                           colour = continent),
             size = 3) +
  labs(x = "Gross Domestic Product Per Capita in International Dollars",
       y = "Life expectancy at birth in years",
       colour = "Year",
       title = "Graph showing Life Expectancy by GDP Per Capita",
       subtitle = "Data from Gapminder Dataset",
       caption = "www.gapminder.org") +
  # Added theme to modify plot
  theme(legend.position = "bottom") # Legend will be placed at bottom

Switching between left/right and top/bottom modifies how the keys in each legend are laid out (horizontal or vertically), and how multiple legends are stacked (horizontal or vertically). If needed, you can adjust those options independently:

  • legend.direction - layout of items in legends (“horizontal” or “vertical”).

  • legend.box - arrangement of multiple legends (“horizontal” or “vertical”).

Finally the panel elements can be modified as below.

# Modifying the panel elements

filter(gapminder, year %in% 2007) |>
  ggplot() +
  geom_point(mapping = aes(x = gdp_percap,
                           y = life_exp,
                           colour = continent),
             size = 3) +
  labs(x = "Gross Domestic Product Per Capita in International Dollars",
       y = "Life expectancy at birth in years",
       colour = "Year",
       title = "Graph showing Life Expectancy by GDP Per Capita",
       subtitle = "Data from Gapminder Dataset",
       caption = "www.gapminder.org") +
  # Added theme to modify plot
  theme(panel.background = element_rect(fill = "lightblue"), # Changing panel background colour
        panel.grid = element_line(colour = "grey60", linewidth = 0.2), # Changing the lines
        panel.border = element_rect(colour = "black", fill = NA))  # Changing the border

If you are struggling with creating a theme you could use ggThemeAssist which provides an interactive user interface for creating a theme, also ggeasy, which is package that makes theme customisation much easier.

Remember that accessibility and organisational guidelines are the priority when creating publication ready plots, so despite the extra complexity at play here with themes, it is so important to get these right. You will see later that once a theme object is created, we can use it for as many plots as we like, simplifying the overall workload significantly.

5.8.3 Exercise

  1. Using the visualisation you created in the previous exercise:
  • Set an appropriate add a theme of your choice.
gapminder_uk <- gapminder |>
                  filter(country %in% "United Kingdom")

# Plotting the data

gapminder_uk |> ggplot() +
  geom_point(mapping = aes(x = gdp_percap, 
                           y = fertility)) +
  labs(x = "Gross Domestic Product Per Capita in International Dollars",
       y = "Fertility (number of children per woman)",
       title = "Graph showing Fertility by GDP Per Capita")

gapminder_uk <- gapminder |>
                  filter(country %in% "United Kingdom")

# Plotting the data

gapminder_uk |> ggplot() +
  geom_point(mapping = aes(x = gdp_percap,
                           y = fertility)) +
  labs(x = "Gross Domestic Product Per Capita in International Dollars",
       y = "Fertility (number of children per woman)",
       title = "Graph showing Fertility by GDP Per Capita") +
  theme_classic()

5.8.4 Amending Default Themes

If we don’t want to define each and every argument, we also can start with an existing theme and alter only some of its arguments.

  • theme_gray() - “the mother of all themes” and fully defined, for example theme_bw() builds upon theme_gray() , while theme_minimal() in turn builds on theme_bw().
# Amending set themes

filter(gapminder, year %in% 2007) |>
  ggplot() +
  geom_point(mapping = aes(x = gdp_percap,
                           y = life_exp,
                           colour = continent),
             size = 3) +
  labs(x = "Gross Domestic Product Per Capita in International Dollars",
       y = "Life expectancy at birth in years",
       colour = "Year",
       title = "Graph showing Life Expectancy by GDP Per Capita",
       subtitle = "Data from Gapminder Dataset",
       caption = "www.gapminder.org") +
  theme_bw() + # Added theme bw to modify plot
  theme(text = element_text(colour = "red")) # Overriding the the text colour

5.8.5 Saving The Theme

We can adapt a default theme and save it to use for other plots. Let’s call it “custom_theme”, assigning it for use in other plots going forward. Establishing a consistency among the plots we create is really important for publication purposes.

# Using preset theme and edit some elements of it.

custom_theme <- theme_bw() +
  # grid elements
  theme(panel.grid.major = element_blank(), # Strip major gridlines
        panel.grid.minor = element_blank(), # Strip minor gridlines
  # Add axis line
        axis.line = element_line(colour = "black", # Colour to black
                                 linewidth = 0.5), # Set thicness

  # Text elements
  # Title
        plot.title = element_text(size = 14, # Set font size
                                  face = "bold", # Bold typeface
                                  hjust = 0, # Left align
                                  vjust = 2), # Raise slightly
  # Subtitle
        plot.subtitle = element_text(size = 12, # Font Size
                                     margin = margin(t = 10) # Margin for plot text
                                       ), 
  # Caption
        plot.caption = element_text(size = 9, # Font size
                                    hjust = 1), # Right align 
  # Axis titles
        axis.title = element_text(size = 10), # Font size
                                
  # Axis text
        axis.text = element_text(size = 10), # Font size
                               
  # Margin for axis text
        axis.text.x = element_text(margin = margin(t = 5,
                                                b = 10)))

Note that since the legend often requires manual tweaking based on the plot we are creating, we will not define it here.

To control the alignment of labels we use hjust (horizontal adjustment) and vjust (vertical adjustment).

Combinations of hjust and vjust.

All nine combinations of hjust and vjust.

5.8.5.1 Example

Now that we have our custom theme, we can then add it to our graph as a layer, as shown below.

# Adding our custom theme as a layer

filter(gapminder, year %in% 2007) |>
  ggplot() +
  geom_point(mapping = aes(x = gdp_percap,
                           y = life_exp,
                           colour = continent),
             size = 3) +
  labs(x = "Gross Domestic Product Per Capita in International Dollars",
       y = "Life expectancy at birth in years",
       colour = "Year",
       title = "Graph showing Life Expectancy by GDP Per Capita",
       subtitle = "Data from Gapminder Dataset",
       caption = "www.gapminder.org") +
  custom_theme # Theme added as a layer

5.8.6 Setting Our Theme

Now that we have created our custom theme, we can set it as the default using “theme_set”. This way of changing the plot design is highly recommended. It allows you to quickly change any element of your plots by changing it once.

# All graphs plotted will use the set theme

theme_set(custom_theme)

Of course, whilst this is fairly universal, there are some examples (such as with pie charts and donut charts) where our theme would need to be tweaked.

5.9 Combining Multiple Plots Side by Side

There are several ways how plots can be combined. We will use the patchwork package in this course, but other options you could use are Grid Extra Package or the Cow Plot Package

Patchwork, like the name suggests, just patches plots together. We don’t actually need to use functions from within the package to accomplish these combinations, mathematical symbols will perform the operations for us once we have loaded Patchwork in. You define plots however you want them to be displayed, then assign them to variable names to be used within Patchwork.

# Assigning plots names

first_plot <- filter(gapminder, year == 1987) |>
                      ggplot() +
                      geom_point(mapping = aes(x = gdp_percap,
                                               y = life_exp)
                                )

second_plot <- filter(gapminder, year == 1987) |>
                      ggplot() +
                      geom_point(mapping = aes(x = gdp_percap,
                                               y = life_exp),
                                 colour = "red",
                                 size = 2)

third_plot <- filter(gapminder, year == 1987) |>
                      ggplot() +
                      geom_point(mapping = aes(x = gdp_percap,
                                               y = life_exp),
                                 colour = "blue")

5.9.1 Examples

We can show multiple plots side by side using the addition (+) sign.

# Setting 2 plots side by side

first_plot + second_plot

We can also put one of top of another plot, here I am also adding a title, subtitle and caption.

# Setting plot on top of another plot

first_plot / second_plot + plot_annotation( # Adding title, subtitle and caption
  title = "This is my title",
  subtitle = "This is my subtitle",
  caption = "This is my caption"
)

# Setting 1 plot with 2 plots beside it

first_plot | (second_plot / third_plot)

5.10 Geometric Objects

Each plot uses a different visual object to represent the data. In ggplot2 syntax, we say that they use different geoms. People often describe plots by the type of geom that the plot uses. For example, bar charts use bar geoms, line charts use line geoms, box plots use box plot geoms, and so on. Scatter plots break the trend; they use the point geom.

To change the geom in your plot, change the geom function that you add to ggplot(). For instance, to make the plots below, you can use this code:

# Bar chart example

gapminder |>
  group_by(continent) |>
  ggplot() +
  geom_bar(mapping = aes(x = continent,
                         colour = continent,
                         fill = continent)
           )

# Histogram example

ggplot(data = gapminder) +
  geom_histogram(mapping = aes(x = life_exp,
                               fill = continent),
                 bins = 25
                 )

# Line graph example

gapminder |>
  filter(country %in% c("Colombia", "Chile", "Argentina", "Brazil", "Peru", "Ecuador")) |>
  ggplot() +
  geom_line(mapping = aes(x = year,
                          y = gdp_percap,
                          colour = country)) +
  ylim(0, 15000)

Every geom function in ggplot2 takes a mapping argument. However, not every aesthetic works with every geom. The package ggplot2 provides over 40 geoms, the best way to get a comprehensive overview is the ggplot2 cheatsheet and for individual geoms, consult the R help documentation.

5.11 Global versus local aesthetic mappings

5.11.1 Global Mapping

You can plot multiple geoms in the same graph. In order to do this, add multiple geom functions to ggplot(), repeating the mapping each time:

5.11.1.1 Example

# Plotting multiple geoms on the same graph

gapminder |>
  filter(country %in% c("Colombia", "Chile", "Argentina", "Brazil", "Peru", "Ecuador")) |>
  ggplot() +
  geom_point(mapping = aes(x = year,
                           y = life_exp)
             ) +
  geom_smooth(mapping = aes(x = year,
                            y = life_exp)
              )

This, however, introduces some duplication in our code. Imagine if you wanted to change the y-axis to display infant mortality instead of life expectancy. You’d need to change the variable in two places, and you might forget to update one. You can avoid this type of repetition by passing a set of mappings to ggplot() directly instead of inside the geom_() layer.

ggplot2 will treat these mappings as global mappings that apply to each geom in the graph. In other words, this code will produce the same plot as the previous code:

# Plotting multiple geoms on the same graph
# Mapping argument specified within ggplot function

gapminder |>
  filter(country %in% c("Colombia", "Chile", "Argentina", "Brazil", "Peru", "Ecuador")) |>
  ggplot(mapping = aes(x = year,
                       y = life_exp)
         ) +
  geom_point() +
  geom_smooth()

This is a really important distinction worth making:

  • It could save on coding should you require the same mappings throughout your visualisation.

  • You have finer control if you wish to assign different variables at different geometric layers.

  • Note that some layers or functions require mappings to function correctly.

5.11.2 Local Mappings

On the contrary, if you place mappings in a geom function (like we have thus far), ggplot2 will treat them as local mappings for the layer. It will use these mappings to extend or overwrite the global mappings for that layer only. This makes it possible to display different aesthetics in different layers.

5.11.2.1 Example

gapminder |>
  filter(country %in% c("Colombia", "Chile", "Argentina", "Brazil", "Peru", "Ecuador")) |>
  ggplot(mapping = aes(x = year,
                       y = life_exp)
         ) +
  geom_point(mapping = aes(colour = country)
             ) +
  geom_smooth(colour = "red")
`geom_smooth()` using method = 'loess' and formula = 'y ~ x'

If you only have one layer in the plot, the way you specify aesthetics doesn’t make any difference. However, the distinction is important when you start adding additional layers.

5.12 Facets

We can create small subplots side by side, which are called facets. These are an excellent way to avoid overplotting data on one set of axes. It also makes it easier to compare data across groups of categorical variables (for example, we could have one graph per continent and so on).

There are two main functions for faceting :

  • facet_wrap() : This will produce a “row” of subplots, one for each categorical variable

  • facet_grid(): As a grid or matrix of plots.

Difference of facet grid, combining groups and plotting them, to facet_wrap, which gives one entry per group.

difference between facet_grid() and facet_wrap()

5.12.1 Facet wrap

Let’s take a look at a similar scatterplot that we created earlier in the course.

# Plotting data

filter(gapminder, year %in% 1987) |>
  ggplot() +
  geom_point(mapping = aes(x = gdp_percap,
                           y = life_exp,
                           colour = continent),
             size = 2)

We could break the plot above into separate plots showing a scatter plot for each continent, allowing for better comparisons across the groups.

5.12.1.1 Example

We use the facet_wrap() function which takes the vars function as an argument, where we specify which variables to wrap (or group) by. The variable that you pass to facet_wrap() should be discrete (a.k.a categorial).

By default, all of the small multiples will have the same vertical axis.

# Using facet wrap and vars()

filter(gapminder, year %in% 1987) |>
  ggplot() +
  geom_point(
    mapping = aes(x = gdp_percap,
                  y = life_exp,
                  colour = continent),
    size = 2) +
  facet_wrap(vars(continent))

We can control the number of rows and columns with nrow (number of rows) and ncol (number of columns) arguments.

# Amend the number of columns

filter(gapminder, year %in% 1987) |>
  ggplot() +
  geom_point(mapping = aes(x = gdp_percap,
                           y = life_exp,
                           colour = continent),
             size = 2 ) +
  facet_wrap(vars(continent),
             ncol = 2)

By default, the labels are displayed on the top of the plot. However, if we use the optional argument “strip.position=” it is possible to place the labels on either of the four sides by setting strip.position = c(“top”, “bottom”, “left”, “right”).

# Amend position of labels using strip.position

filter(gapminder, year %in% 1987) |>
  ggplot() +
  geom_point(mapping = aes(x = gdp_percap,
                           y = life_exp,
                           colour = continent),
             size = 2) +
  facet_wrap(vars(continent), 
             strip.position = "bottom")

5.12.2 Facet grid

If we want to create more complex facets/subplots we can use facet grid. This enables us to create facets with 2 categorical variables, allowing for much more complex visualisations. This gives way to more arguments we can use to make better use of the function:

  • rows - Which variable to set as the rows
  • cols - Which variable to set as the columns

5.12.2.1 Example

Notice here that the column contains vars() just like rows does, but we are using a condition on the “pop” variable, which will generate a second categorical variable, with the boolean values TRUE and FALSE to group by.

# Using facet grid

filter(gapminder, year %in% 1987) |>
  ggplot() +
  geom_point(mapping = aes(x = gdp_percap,
                           y = life_exp,
                           colour = continent),
             size = 2) +
  facet_grid(rows = vars(continent), 
             cols = vars(pop > 10000000)
             )

There are lots of other useful parameters this function can take to tweak the output in the form we desire, for example organising by rows or columns. You can read about this in help documentation.

With one variable facet_grid produces similar output to facet_wrap():

# Using facet grid

filter(gapminder, year %in% 1987) |>
  ggplot() +
  geom_point(mapping = aes(x = gdp_percap,
                           y = life_exp,
                           colour = continent),
             size = 2) +
  facet_grid(rows = vars(continent)
             )

5.13 Saving Visualisations

5.13.1 Filetypes for Images

Often when we have a finished visualisation, we want to save it for publishing or further use in documentation, ggplot can save as a variety of file formats:

  • Image (Raster) Formats, such as
    • png - ‘portable network graphics’
    • jpg /.jpeg - ‘joint photographic experts group’
    • tif /.tiff - ‘tagged image file format’
  • Vector Formats, such as
    • ps /.eps - Postscript/ Encapsulated Postscript
    • pdf - ‘portable document format’
    • svg - ‘scalable vector graphics’

Why may we care about file types?

Data points in a vector image are a single point, rasters are points on axes lines.

Comparison of Vector and Raster images when zoomed in

Raster images are images we’re often most familiar with. These are made up of tiny squares; called pixels. Remembering back to the early days of digital cameras pictures often had that “squares” like quality. If we enlarge these images we can often find they look blurry; or pixelated. However they are often smaller file sizes than vector images. Software for editing Raster images includes Microsoft Paint (a classic!), Adobe Photoshop and freeware options like Gimp and Paint.net.

Vector images are instead made up of paths. This gives them the ability to be scaled up or down at will without losing any quality, or appearing pixelated. Clip-arts are a classic example of Vector images. A downside to them is that they’re often much larger file sizes and your audience may not be able to view the files if sent individually (rather than embedded in a Notebook or markdown document).

Editing these is often more complicated; and requires special software. Note that we’ve included PDF files, as ggplot creates them as vectors; but not all generic PDF files are automatically vectors. Software for editing Vector images includes Adobe Illustrator and freeware options like Inkscape.

5.13.2 How to save visualisations

You can save your plots using the ggsave() function, which will save the most recent plot you have created. You can specify the dimension and resolution of your plot by adjusting the appropriate arguments (width, height, and dpi) to create high quality graphics for publication.

It has the following important arguments:

  • The first argument, “path”, specifies the path where the image should be saved. The file extension will be used to automatically select the correct graphics device. ggsave() can produce .eps, .pdf, .svg, .wmf, .png, .jpg, .bmp, and .tiff files.

  • width and height control the output size, specified in inches. If left blank, they’ll use the size of the on-screen graphics device.

  • For raster graphics (i.e. .png, .jpg), the dpi argument controls the resolution of the plot. It defaults to 300, which is appropriate for most printers, but you may want to use 600 for particularly high-resolution output, or 96 for on-screen (e.g., web) display.

If you save through the export button you will typically have a low DPI (72) that has jagged edges on lines (known as aliasing), as opposed to exporting with a higher DPI which will give a higher quality appearance.

If are having issues with anti aliasing issue on windows you can use the Cairo Package

# To save plots

ggsave(path = "./outputs/output_plot.png", 
       width = 7, 
       height = 5, 
       dpi = 300)

Note that a 150-300 dpi is suggested resolution for power points.

5.13.3 Exercise

  1. Using the visualisation you created in the previous exercise:
  • Save it in the outputs folder as a .png file at 300 dpi.

  • Then save it as a PDF file, compare the two.

# Create the previous visualisation

gapminder_uk <- gapminder |>
  filter(country %in% "United Kingdom")

# Plotting the data

gapminder_uk |> ggplot() +
  geom_point(mapping = aes(x = gdp_percap, 
                           y = fertility)
             ) +
  labs(x = "Gross Domestic Product Per Capita in International Dollars",
       y = "Fertility (number of children per woman)",
       title = "Graph showing Fertility by GDP Per Capita") +
  theme_classic()

# Save the plot

gapminder_uk <- gapminder |>
  filter(country %in% "United Kingdom")

# Plotting the data

my_plot <- ggplot(data = gapminder_uk) +
  geom_point(mapping = aes(x = gdp_percap, 
                           y = fertility)
             ) +
  labs(x = "Gross Domestic Product Per Capita in International Dollars",
       y = "Fertility (number of children per woman)",
       title = "Graph showing Fertility by GDP Per Capita") +
  theme_classic()

# saves as png
ggsave(filename = "./outputs/my_plot.png", 
       width = 7, 
       height = 5, 
       dpi = 300)

# saves as pdf
ggsave(filename = "./outputs/my_plot.pdf",
       width = 7,
       height = 5,
       dpi = 300)

6 Summary

Congratulations! You have climbed the immense mountain of Chapter 2 and conquered the various functionalities of the ggplot2 package as well as Patchwork. This was a very in depth chapter but you can now appreciate the immense depth we can go into when teaching plotting and just approaching visualisation problems in general.

Make sure you take a break before moving onto Chapters 3 and 4, you deserve it!

Reuse

Open Government Licence 3.0