# Loading Packages
library(tidyverse) # Contains ggplot2, tidyr, dplyr and readr which we require
library(janitor) # To clean column names
library(scales) # Methods for automatically determining labels for axes and legends
library(patchwork) # Used to combine multiple plots
library(RColorBrewer) # Colour palettes
Chapter 2 - Introducing ggplot2
1 Overview of ggplot
ggplot2 is a very powerful data visualisation tool that is great for exploring data and producing publication quality figures. As mentioned in the Chapter One of the Introduction to R Course, the BBC use R to produce their graphics.
The package is based upon theory presented in the book The Grammar of Graphics written by Leland Wilkinson, where a set of rules for constructing statistical graphics by combining different types of layers is defined. This is why I like the cake allegory for the inner mechanism of the package, as it allows you to visualise in your head the elements of a plot and how they come together.
As such, the gg in ggplot2 stands for grammar of graphics, which is a way of thinking about plotting as having grammar elements that can be applied in succession to create a plot. This is reproducible in that every graph can be built from the same few components, which themselves can be tweaked for each specific purpose. Comparing this to cakes, much of the time we have the same base ingredients for the sponge, but the ratio of these, the subsequent decoration and embellishments often differ between cake types.
In this section we’ll explore creating a basic plot using ggplot. We’ll do this with just one kind of plot, the perfect place to start for anyone plotting for the first time, the scatter plot. By the end of this section we’ll have built our scatter plot to meet AF standards and have a good core understanding of how ggplot works.
2 Basic Foundation
Each ggplot will have three basic elements of ggplot these are:
The data, the dataset containing the variables of interest.
Geometric layers, the shape of our visualisation (the plot type itself).
Aesthetics, which are visual properties of the objects in your plot. They include things like the size, the shape, or the color of your points.
We mentioned that we will be plotting scatter plots to build up knowledge of the package. These have points that show the relationship between two sets of data (if you are familiar with statistics we often use these to diagnose correlations).
2.1 Packages
The first thing we always do at the start of an analysis is load in our packages. In this chapter we will use the following
2.2 The Data
First lets look at our data; it is vital to know the data types within our dataset as the plot types available to us differ as a result.
The data must be in a tidy data frame. This is one of the challenges of performing robust data visualisation, you are required to get the data in a tidy format. Tidy data frames are described in more detail in the Introduction to R course, which is a pre-requisite for this one. Feel free to review the content of Chapter 4 - Working with DataFrames if you need a reminder, but for now, all you need to know is that a tidy data frame has variables in the columns and observations in the rows.
We will use the gapminder dataset throughout this course, which is collected by the gapminder foundation and is well known for being an excellent teaching tool.
# Importing data using the read_csv function
<- read_csv("./data/gapminder.csv")
gapminder
# Cleaning column names and dropping missing values
<- clean_names(gapminder) |>
gapminder drop_na()
|> glimpse() gapminder
Rows: 1,224
Columns: 8
$ country <chr> "Albania", "Albania", "Albania", "Albania", "Albania"…
$ continent <chr> "Europe", "Europe", "Europe", "Europe", "Europe", "Eu…
$ year <dbl> 1962, 1967, 1972, 1977, 1982, 1987, 1992, 1997, 2002,…
$ life_exp <dbl> 64.820, 66.220, 67.690, 68.930, 70.420, 72.000, 71.58…
$ pop <dbl> 1728137, 1984060, 2263554, 2509048, 2780097, 3075321,…
$ gdp_percap <dbl> 2312.889, 2760.197, 3313.422, 3533.004, 3630.881, 373…
$ infant_mortality <dbl> 106.5, 86.8, 71.0, 58.6, 56.1, 40.8, 32.5, 26.8, 21.0…
$ fertility <dbl> 5.96, 5.38, 4.81, 4.09, 3.46, 3.13, 2.87, 2.61, 2.20,…
3 The Data Layer
Lets look at the basic syntax to plot using ggplot:
starts with “ggplot()”
then we specify the data, this is our first layer, known as the data layer.
3.1 Example
Let’s see the relationship between the gdp_percap and life_exp for Brazil. In order to do this we need a new DataFrame with just the data for Brazil, let’s call it “gm_brazil”. We will use the pipe operator introduced in the pre-requisite course to chain commands together.
As a reminder data manipulation is not explained in this course, but can be found in the Introduction to R course.
# Filtering the gapminder data
<- gapminder |>
gm_brazil filter(country == "Brazil")
# Note this will provide an empty graph
ggplot(data = gm_brazil)
This will plot the base of our visualisation, but as we have not specified what kind of graph we want, we only get a blank canvas. The function ggplot() creates a coordinate system that you can add layers to (we have axes, but nothing plotted on them and no ranges assigned to them).
To visualise the relationship between life_exp and gdpPercap, we need to add a geometric layer.
4 Geometric layers
4.1 The syntax of adding layers
In ggplot2 we create graphs by adding layers. Layers can define geometries, compute summary statistics, define what scales to use, or even change styles. In order to add layers, we use the symbol “+”, as opposed to the pipe operator which you may have expected to see. The pipe is used to chain operations on a data structure together, whereas adding layers to a plot is a process of addition, the data is not passed to these processes, it is just used for the data layer, as such we use the plus symbol.
Geometric layers are the shapes that represent the data, whose names begin with geom_. They follow the naming pattern “geom_X”, where “X” is the name of the geometry (or plot type) we are interested in. For example we have:
- geom_point (Graph of points or a scatter plot)
- geom_bar (Bar graph)
- geom_histogram (A histogram)
and many more that we will see in Chapter 4.
In the console, type geom (don’t run it) and R will attempt to autocomplete with the list of options for you to choose from.
# Type the code below to see the list of geometric layers
geom
We can also see them on the ggplot cheat sheet, which is an incredibly useful reference piece once you have completed the core content of this course. We highly recommend bookmarking this as well as the other cheat sheets provided by R Studio.
4.2 Example
Looking back at our code so far we have the function ggplot() and then we specified our data gm_brazil, this is our first layer. Now we can add a ‘layer’ to the plot using one of the “+ geom_()” methods to define the shape of the graph.
To see the relationship between the life_exp and gpd_percap we can use geom_point() to plot a scatter plot. It is recommended that you enter a new line after using the plus sign, like you would with the pipe operator to improve the readability of your code.
# Note this code will give an error
ggplot(data = gm_brazil) + # Plus sign to add another layer
geom_point()
Other layers we can add to a plot include the plot title, axes labels, and visual themes for the plots. We will look at some of these later in the course. They are stacked in the order that the code is written, which can create subtle differences in the final product dependent on their placement in some cases.
If you run the code above you will have got an error
Error: geom_point requires the following missing aesthetics: x, y
We still haven’t got a graph yet as we haven’t specified which parts of the data we would like to plot, which bring us on the aesthetics.
5 Aesthetics
An aesthetic is a visual property of the objects in your plot. Aesthetics include things like x/y position, the size, the shape, or the color of your points. In this case we first need to specify what we want to be displayed on the x and y axis.
Aesthetic attributes are mapped to variables in the dataset, we do this by adding the aes() function inside our geom_point() function.
5.1 Example
In the following, the aes() statement below tells R that we want to set gdp_percap to the x axis, and life_exp to the y axis.
# Mapping aesthetics
ggplot(data = gm_brazil) +
geom_point(mapping = aes(x = gdp_percap,
y = life_exp)) # Assigned the aesthetics to 'mapping' a very common variable name in this field
5.1.1 Exercise
Filter data for United Kingdom and call it gapminder_uk.
Make a ggplot scatter plot with:
gdp_percap as the x axis
fertility as the y axis
ggplot(data = Your data) +
<geom_function>(mapping = aes(<mappings>))
# (a)
<- gapminder |>
gapminder_uk filter(country == "United Kingdom")
# (b)
ggplot(data = gapminder_uk) +
geom_point(mapping = aes(x = gdp_percap,
y = fertility))
5.2 Aesthetic Mappings
Previously we stated that we can change other elements of the plot like colour and size. In this section we will explore how to do this in practice.
Everything inside aes() will have a scale, if none is provided it will get a default. Different types of aesthetic attributes work better with different types of variables. For example, colour and shape works well with categorical variables (as we get one per category), while size works well for continuous variables (as it is on a numerical scale).
Note the mapping requirements differ with the different geometries, which we will see examples of later in the course.
5.2.1 Colour
ggplot2 allows us to customise the colours of plots using its fill and color arguments. We focus on colour for now and explore fill later on.
The explanatory text will spell colour with the UK spelling. ggplot accepts both the American spelling for the argument: color= and the UK spelling: colour=
You can set the aesthetic properties of your geom manually. This sets the chosen aesthetics globally within the graph.
Let’s look at assigning by name first.
Below is a graph showing the relationship between the gdp_percap variable and life_exp.
Remember that You can pipe data into ggplot (and combine with data manipulation functions) but within ggplot you need to ADD LAYERS with +.
# Piping data into the graph
filter(gapminder, year == 1987) |> # Data piped into
ggplot() + # ggplot function initiating plot
geom_point(mapping = aes(x = gdp_percap,
y = life_exp))
We can set a colour to geom_point. For example, we can make all of the points in our plot blue by specifying the colour argument of geom_point() to the name of a colour as a character string.
If you want all data points to be the same colour, you would define colour = “blue” outside the aes() function. Placing this inside the aes() function means something different which we will cover later.
# Colour specified using a colour name
filter(gapminder, year == 1987) |>
ggplot() +
geom_point(mapping = aes(x = gdp_percap,
y = life_exp),
colour = "blue") # Colour specified outside aes function
We can see a complete list of the available choices here: Colours in R.
Note that you can also see a complete list of the 657 colours typing colors().
# Returns the built-in colour names
colours()
We can also specify colour by HTML Names and HEX codes, ggplot will accept most common names, and HTML colour names - here I’m using colour = “OliveDrab”
# Colour specified using HTML colour names
filter(gapminder, year == 1987) |>
ggplot() +
geom_point(mapping = aes(x = gdp_percap,
y = life_exp),
colour = "OliveDrab") # Colour specified outside aes function
For finer control we can also use hex codes, which can define all colours. A hex code looks like this: #9E2A2B; it is given as a string with a # symbol at the front. This particular code here is a burnt red colour.
You can find hex codes here HTML colour codes.
# Colour specified using HTML HEX code
filter(gapminder, year == 1987) |>
ggplot() +
geom_point(mapping = aes(x = gdp_percap,
y = life_exp),
colour = "#9E2A2B") # Colour specified outside the aes function
We can also set colours by the RGB value. This is the Red, Green and Blue value, which can create all colours by combining various scales of each colour. This is on a scale of 0 to 255 for each value, as an example, the ONS Blue has the RGB value (0, 61, 89).
A colour can be specified using R’s “rgb()” function that takes three arguments: red, green, and blue (which, by default, all have a range of [0, 1]).
# Colour specified using RGB values
filter(gapminder, year == 1987) |>
ggplot() +
geom_point(mapping = aes(x = gdp_percap,
y = life_exp),
colour = rgb(0, 61, 89,
maxColorValue = 255)) # Colour specified outside aes function
In order to use colour effectively with your data, most importantly you need to know if you are dealing with a categorical or continuous variables.
5.2.2 Mapping the Colour
You can use the different aesthetics to convey information, by mapping the aesthetics in your plot to the variables in your dataset. To map an aesthetic to a variable, associate the name of the aesthetic to the name of the variable inside the aes() function, like we did with x and y.
ggplot2 will automatically assign a unique level of the aesthetic to each unique value of the variable. It will even add a legend that explains which levels correspond to which values.
5.2.2.1 Example
For example, you can map the colours of your points to the continent.
# Mapping the colour of our data points
filter(gapminder, year == 1987) |>
ggplot() +
geom_point(mapping = aes(x = gdp_percap,
y = life_exp,
colour = continent)) # Colour specified within aes
We can also amend these colours manually, using the “scale_color_manual()” function. This is added on as a layer to our plot.
# Here we are specifying our colour values manually
filter(gapminder, year == 1987) |>
ggplot() +
geom_point(mapping = aes(x = gdp_percap,
y = life_exp,
color = continent)
+
) scale_colour_manual(values = c("Africa" = "blue", # Added as a layer
"Americas" = "red",
"Asia" = "green",
"Europe" = "yellow",
"Oceania" = "grey"))
5.2.3 Size
We can set the sizes of the points within our scatter plot.
In the example below, we set every point to be the same size of 3, which the diameter in mm.
# Changing the size of our points manually
filter(gapminder, year == 1987) |>
ggplot() +
geom_point(mapping = aes(x = gdp_percap,
y = life_exp),
color = "red", # Colour specified outside aes function
size = 3) # size specified outside aes function
We can also map variables to size in the same way.
# Mapping the size and colour of our data points
filter(gapminder, year == 1987) |>
ggplot() +
geom_point(mapping = aes(x = gdp_percap,
y = life_exp,
color = continent,
size = pop)) # size and colour specified within aes function
Note that R will deal with large numbers using scientific notation. In the legend for Pop we can see 2.5e+08 which denotes to 2.5 × 10^8 = 250,000,000 = 250 million.
You can turn off scientific notation by specifying “scipen” within the “options” functions.
# Turn off scientific notation
options(scipen = 999)
However, if this is not to your liking and you’d rather the standard form output, you can reverse this by using the following:
- options(scipen = 0)
Where we are effectively reverting scipen to its default value of 0.
Like with colours, We can also amend these sizes manually, by specifying which colours we want using the “scale_size_manual” function which functions similarly to “scale_colour_manual” and would also be added a layer.
5.2.4 Shape
R has built in shapes that are identified by numbers. There are some seeming duplicates: for example, 0, 15, and 22 are all squares. The difference comes from the interaction of the colour and fill aesthetics.
- The hollow shapes (0–14) have a border determined by colour;
- The solid shapes (15–18) have a border and are filled with colour;
- The filled shapes (21–24) have a border of colour but are filled with fill.
We can see list of all the available shapes, below.
These are also available on the Cheat sheet.
# Changing the shape of the points
filter(gapminder, year == 1987) |>
ggplot() +
geom_point(mapping = aes(x = gdp_percap,
y = life_exp),
colour = "navy", # Colour specified outside aes function
size = 3, # Size specified outside aes function
shape = 17) # Shape specified outside aes function
We can also map variables to shape in the same way as we did with colour and size.
# Mapping the shape of our data points
filter(gapminder, year %in% c(1987, 2007)) |>
ggplot() +
geom_point(mapping = aes(x = gdp_percap,
y = life_exp,
shape = continent,
colour = as.factor(year)))
You may have noticed a sneaky use of “as.factor()” here, this is because in the dataset, the year column is a numeric containing integer values of the years. To use this as a way to categorise our colours, we must convert it to a categorical variable, using the as.factor() function. This is very useful when you have a numeric (particularly a small number of integers) column you want to use as a mapper in this way.
For more details on factors there is good tutorial here, Understanding Factors.
Note that ggplot2 will only use six shapes at a time. By default, additional groups will go unplotted when you use the shape aesthetic.
We can also amend these shapes manually, using the shapes we can specify which shapes we want using the “scale_shape_manual()” function as another layer to the plot.
# Specifying shapes using scale shape manual
filter(gapminder, year %in% c(1987, 2007)) |>
ggplot() +
geom_point(mapping = aes(x = gdp_percap,
y = life_exp,
shape = continent,
color = as.factor(year))) +
scale_shape_manual(values = c(15, 16, 17, 1, 11))
5.2.5 Transparency
We can set the transparency of our points, using the parameter “alpha”. Alpha refers to the opacity of a point, values of which range from 0 to 1, with lower values corresponding to more transparent colours.
# Changing the transparency of the points
filter(gapminder, year == 1987) |>
ggplot() +
geom_point(mapping = aes(x = gdp_percap,
y = life_exp),
color = "navy", # Colour specified outside aes function
size = 4, # Size specified outside aes function
alpha = 0.4) # Alpha specified outside aes function
We can also map variables to alpha in the same way as before.
# Mapping the alpha of our data points
filter(gapminder, year == 1987) |>
ggplot() +
geom_point(mapping = aes(x = gdp_percap,
y = life_exp,
color = continent,
alpha = infant_mortality)) +
scale_colour_manual(values = c("Africa" = "blue",
"Americas" = "red",
"Asia" = "green",
"Europe" = "yellow",
"Oceania" = "grey"))
We can also make use of the “scale_alpha_manual” function to specifically specify what alpha values we want for each level of the variable, like we have with colour above.
5.2.6 Exercises
- What happens if you map an aesthetic to something other than a variable name, like aes(color = life_exp > 65)?
# Condition in the mapping?
filter(gapminder, year == 1987) |>
ggplot() +
geom_point(mapping = aes(x = fertility,
y = life_exp,
colour = life_exp > 65))
- Suppose that instead of indicating continent using colour, you wanted all the points in the plot below to be blue, how would do it?
# Filtering the gapminder data
# Plotting the data
filter(gapminder, year == 1987) |>
ggplot() +
geom_point(mapping = aes(x = gdp_percap,
y = life_exp,
colour = continent))
- What happens if you map an aesthetic to something other than a variable name, like aes(color = life_exp > 65)?
# Mapping color to an expression
filter(gapminder, year == 1987) |>
ggplot() +
geom_point(mapping = aes(x = fertility,
y = life_exp,
colour = life_exp > 65))
Aesthetics can be mapped to expressions like “colour = life_exp > 65”. The ggplot() function behaves as if a temporary variable was added to the data with with values equal to the result of the expression. In this case, the result of “color = life_exp > 65” is a logical variable which takes values of TRUE or FALSE.
- Suppose that instead of indicating continent using colour, you wanted all the points in the plot below to be blue, how would do it?
When you want color to be a variable from your dataset, put “colour =
# Setting the color to blue
filter(gapminder, year == 1987) |>
ggplot() +
geom_point(mapping = aes(x = gdp_percap,
y = life_exp),
colour = "blue")
5.3 Colour Palettes
We can choose specific colour palettes, such as those provided by the “RColorBrewer” package for the aesthetics in our plots. These colours have been designed to work well in a wide variety of situations. The package provides palettes for different types of scale (sequential, diverging, qualitative). You will need to install and load this package to use it as it is not part of the tidyverse.
sequential - great for low-to-high situations where one extreme is exciting and the other is boring
qualitative - great for things that range from “extreme and negative” to “extreme and positive”
diverging - great for non-ordered categorical things – such as your typical factor, like country or continent
# Displaying the colour palettes we have
display.brewer.all()
5.3.1 Example
We can add the palette as a layer as shown in the example below,
# Adding a colour palette, this is added as layer.
filter(gapminder, year == 1987) |>
ggplot() +
geom_point(mapping = aes(x = gdp_percap,
y = life_exp,
colour = continent),
size = 3) +
scale_color_brewer(type = diverging, # Palette added as a layer
palette = "RdYlBu")
5.4 Adding titles and labels
Good labels are critical for making your plots accessible to a wider audience. Always ensure the axis and legend labels are fully descriptive. When adding a title and more meaningful labels, it’s always a good idea to replace short variable names with more detailed descriptions, and to include the units. This can be done using the “labs()” function.
Within this function you can add several arguments, e.g. adding in a subtitle and a caption.
subtitle - adds additional detail in a smaller font beneath the title
caption - adds text at the bottom right of the plot, often used to describe the source of the data
5.4.1 Example 1 - Adding labels
# Using the labs function to rename x and y axes, add title, subtitle and caption
filter(gapminder, year == 1987) |>
ggplot() +
geom_point(mapping = aes(x = gdp_percap,
y = life_exp,
colour = continent),
size = 3) +
labs(x = "Gross Domestic Product Per Capita in International Dollars", # Labels
y = "Life expectancy at birth in years",
title = "Graph showing Life Expectancy by GDP Per Capita",
subtitle = "Data from Gapminder Dataset",
caption = "www.gapminder.org")
5.4.2 Example 2 - Change the labels on a legend
We can also change the labels of our legend within the labs() function.
# Changing the legend label in the labs function
filter(gapminder, year %in% 2007) |>
ggplot() +
geom_point(mapping = aes(x = gdp_percap,
y = life_exp,
colour = continent),
size = 3) +
labs(x = "Gross Domestic Product Per Capita in International Dollars",
y = "life expectancy at birth in years",
color = "Continent", # Legend Label
title = "Graph showing Life Expectancy by GDP Per Capita",
subtitle = "Data from Gapminder Dataset",
caption = "www.gapminder.org")
5.4.3 Exercise
- Using the visualisation you created in the previous exercise:
- Set an appropriate Title and X and Y axis Label.
<- gapminder |>
gapminder_uk filter(country %in% "United Kingdom")
# Plotting the data
|> ggplot() +
gapminder_uk geom_point(mapping = aes(x = gdp_percap,
y = fertility))
<- gapminder |>
gapminder_uk filter(country %in% "United Kingdom")
# Plotting the data
|> ggplot() +
gapminder_uk geom_point(mapping = aes(x = gdp_percap,
y = fertility)) +
labs() # To add lables and titles
<- gapminder |>
gapminder_uk filter(country %in% "United Kingdom")
# Plotting the data
|> ggplot() +
gapminder_uk geom_point(mapping = aes(x = gdp_percap,
y = fertility)) +
labs(x = "Gross Domestic Product Per Capita in International Dollars",
y = "Fertility (number of children per woman)",
title = "Graph showing Fertility by GDP Per Capita")
5.5 Changing the limits of our axes
There are two reasons you might want to specify limits rather than relying on ggplot to set them for us;
You want to shrink the limits to focus on an interesting area of the plot.
You want to expand the limits to make multiple plots match up.
The functions we use are xlim() and ylim(), which modify the limits of axes. These are added as another layer to the plot as follows
5.5.1 Examples
# Changing the limit of the y axis using the ylim function
filter(gapminder, year %in% 2007) |>
ggplot() +
geom_point(mapping = aes(x = gdp_percap,
y = life_exp),
size = 3) +
labs(x = "Gross Domestic Product Per Capita in International Dollars",
y = "Life expectancy at birth in years",
title = "Graph showing Life Expectancy by GDP Per Capita",
subtitle = "Data from Gapminder Dataset",
caption = "www.gapminder.org") +
ylim(0, 85) # Amend axis limit, lower to 0, upper to 85
Note : it is possible to specify only the lower or upper bound of a limit. For instance, try “ylim(0,NA)” and observe the results.
Alternatively you could use the function coord_cartesian(), which takes the arguments xlim and ylim, allowing you to zoom on specific regions of the plot.
# Changing the limit of the y axis using the coord_cartesian function
filter(gapminder, year %in% 2007) |>
ggplot() +
geom_point(mapping = aes(x = gdp_percap,
y = life_exp),
size = 3) +
labs( x = "Gross Domestic Product Per Capita in International Dollars",
y = "Life expectancy at birth in years",
title = "Graph showing Life Expectancy by GDP Per Capita",
subtitle = "Data from Gapminder Dataset",
caption = "www.gapminder.org") +
coord_cartesian(ylim = c(24, 83)) # Zooming to 24 - 83
You can also force ggplot to plot the graph starting from origin. Using expand_limits(), this is also added on as a layer.
# Changing the limit of the y axis using the expand limits function
filter(gapminder, year %in% 2007) |>
ggplot() +
geom_point(mapping = aes(x = gdp_percap,
y = life_exp),
size = 3) +
labs(
x = "Gross Domestic Product Per Capita in International Dollars",
y = "Life expectancy at birth in years",
color = "Year",
title = "Graph showing Life Expectancy by GDP Per Capita",
subtitle = "Data from Gapminder Dataset",
caption = "www.gapminder.org") +
expand_limits(x = 0, y = 0) # Expand limits
It’s important to note that ggplot automatically makes the decision of the scale of my axes.
We’ll cover this in more depth when we revisit scatter plots; but compare the two visualisations below. The left uses the automatically calculated axes. From a quick glance the correlations look different, depending on how and where the axis starts. We will explain how to create multiple plots side by side later.
5.6 Annotation
In a prior section we looked at adding some labels to our axes, we can also add labels into the plot itself (label each point and line for example). Most plots will not benefit from adding text to every single observation on the plot, but labelling outliers and other points of interest can be really useful.
We do this by adding
geom_text() - which adds label text at the specified x and y positions.
geom_label() - draws a rectangle behind the text, making it easier to read
annotate() - useful for adding small annotations (such as text labels).
5.6.1 Examples
geom_text() has the most aesthetics of any geom, because there are so many ways to control the appearance of a text.
# geom text adds the label
# Data Manipulation
<- gapminder |>
example_data filter(year == "2007" & life_exp > 82)
# Plotting the data
filter(gapminder, year %in% 2007) |>
ggplot() +
geom_point(mapping = aes(x = gdp_percap,
y = life_exp,
colour = continent),
size = 3) +
geom_text(data = example_data, # New element
mapping = aes(x = gdp_percap,
y = life_exp,
label = country),
colour = "black")
Using geom label adds a rectangular box to make it easier to see the label.
# geom label adds the label
# Data Manipulation
<- gapminder |>
example_data filter(year == "2007" & life_exp > 82)
# Plotting the data
filter(gapminder, year %in% 2007) |>
ggplot() +
geom_point(mapping = aes(x = gdp_percap,
y = life_exp,
colour = continent),
size = 3) +
geom_label(data = example_data, # New element - Added as a layer
mapping = aes(x = gdp_percap,
y = life_exp,
label = country),
colour = "black")
We can also annotate the graph, which is also added as layer.
# We can specify x and y coordinates for the annotation
# plotting the data
filter(gapminder, year %in% 2007) |>
ggplot() +
geom_point(mapping = aes(x = gdp_percap,
y = life_exp,
colour = continent),
size = 3) +
annotate(geom = "text",
x = 30000,
y = 65,
label = "This is my annotation at x = 30000 and y = 65") # this annotates the graph
We can also change the colour of the text.
# We can change the colour of our annotation
filter(gapminder, year %in% 2007) |>
ggplot() +
geom_point(mapping = aes(x = gdp_percap,
y = life_exp,
colour = continent),
size = 3 ) +
annotate(geom = "text",
x = 30000,
y = 65,
label = "These are the Countries with the highest life expectancy",
colour = "red") # we can change the colour
5.7 Adding lines
Horizontal and vertical lines can be added to our plots, allowing us to highlight/group particular regions or highlight a not so obvious pattern. We do this using:
geom_hline(yintercept = a) - Horizontal line at the y intercept (value of y) provided.
geom_vline(xintercept = b) - Vertical line at the x intercept (value of x) provided.
5.7.1 Example - Marking the mean
This trick is often used to include the average of particular values into a plot, so we can see deviations from the mean, median etc. In order to compute these we use the “mean()” function like we did back in Intro to R, feel free to review the Summary Statistics and Aggregation chapter if you’re stuck.
# Adding a horizontal line
# Data Manipulation
<- gapminder |>
example_data filter(year == "2007")
# Plotting the data
ggplot(data = example_data) +
geom_point(mapping = aes(x = gdp_percap,
y = life_exp,
colour = continent),
size = 3) +
geom_hline(yintercept = mean(example_data$life_exp), # horizontal line at the mean
colour = "red")
We can also label the line using annotate to let people know what it means.
# Annotating our line
# Plotting the data
ggplot(data = example_data) +
geom_point(mapping = aes(x = gdp_percap,
y = life_exp,
colour = continent),
size = 3) +
geom_hline(yintercept = mean(example_data$life_exp),
colour = "black",
linewidth = 1) +
annotate(geom = "text",
x = 0,
y = 68,
label = "Mean") # adds a label to the line
5.8 Setting the Theme
5.8.1 Using a Default Theme
We can also modify the overall theme of the plot; which changes the styling, be it colours, fonts, backgrounds and so on. When creating the plot you determine how the data is displayed, then after it has been created you can edit every detail of the rendering, using the system of themes that are available to us.
By default we get theme_grey(), the signature ggplot2 theme with a light grey background and white grid lines. The theme is designed to put the data forward while supporting comparisons. However, ggplot2 includes seven themes by default, and you can add more by installing packages, e.g ggthemes.
The theme alterations are added on as an extra layer with the + sign at the end, known as the theme layer. For example, we can change the default greyish background by adding a new theme, you can see all the the options by typing “theme_” and observing the autofill.
You can also create your own personalised themes and assign them as variables, if you are trying to match a particular corporate style and each plot must adhere to the same guidelines.
# Adding a new theme as a layer
filter(gapminder, year %in% 2007) |>
ggplot() +
geom_point(mapping = aes(x = gdp_percap,
y = life_exp,
colour = continent),
size = 3) +
labs(x = "Gross Domestic Product Per Capita in International Dollars",
y = "Life expectancy at birth in years",
colour = "Year",
title = "Graph showing Life Expectancy by GDP Per Capita",
subtitle = "Data from Gapminder Dataset",
caption = "www.gapminder.org") +
theme_bw() # Added new theme
5.8.2 Changing Existing Themes
The existing themes are a great place to start but don’t give you a lot of control. To modify individual elements, you need to use the theme() function to override the default setting for an element with an “element_” function, thus creating our own theme to apply to our plots.
We can amend a large number of things like fonts, font sizes, axis ticks, legend position etc, which go a long way in making our plots more accessible to users, we will see alot of this in Chapters 3 and 4. This is quite an overwhelming new section so it is recommended you explore the documentation and bookmark it for future use, lest we get lost in the sea of different options available to us.
Every single component of a ggplot graph can be customized. For more details on what you can amend have a look at Modify components of a theme. There are four basic built-in element functions, which specific elements can be changed using:
- text - element_text(), draws labels and headings. You can control the font family, face, colour, size and justification.
- lines - element_line(), draws lines, here you can control the colour, size and linetype.
- rectangles - element_rect(), draws rectangles, mostly used for backgrounds, here you can change the fill, colour and border colour, size and linetype.
- blank - element_blank(), draws nothing. Use this if you don’t want a specific element to be included in the plot (this provides a useful eraser if the plot theme you are using is close to what you want but has one or two pesky elements you don’t need).
There are around 40 unique elements that control the appearance of the plot. For more information run vignette(“ggplot2-specs”) in the console.
Now for some examples.
For example we can add a custom theme to our graph below,
# Modifying the plot elements
# Theme as added as a layer
filter(gapminder, year %in% 2007) |>
ggplot() +
geom_point(mapping = aes(x = gdp_percap,
y = life_exp,
colour = continent),
size = 3) +
labs(x = "Gross Domestic Product Per Capita in International Dollars",
y = "Life expectancy at birth in years",
colour = "Year",
title = "Graph showing Life Expectancy by GDP Per Capita",
subtitle = "Data from Gapminder Dataset",
caption = "www.gapminder.org") +
# Added theme to modify plot
theme(plot.background = element_rect(fill = "slategray3", # Colour of background
colour = "black", # Colour and size of background border
linewidth = 2),
plot.title = element_text(colour = "red", # Colour of title
face = "bold"), # Font of title
plot.margin = margin(t = 20, # Margin of plot
r = 20,
b = 20,
l = 20,
unit = "pt"))
The example below modifies some of the elements related to the axes.
# Modifying the axis elements
# Theme as added as a layer
filter(gapminder, year %in% 2007) |>
ggplot() +
geom_point(mapping = aes(x = gdp_percap,
y = life_exp,
colour = continent),
size = 3) +
labs(x = "Gross Domestic Product Per Capita in International Dollars",
y = "Life expectancy at birth in years",
colour = "Year",
title = "Graph showing Life Expectancy by GDP Per Capita",
subtitle = "Data from Gapminder Dataset",
caption = "www.gapminder.org") +
# Added theme to modify plot
theme(axis.text = element_text(colour = "red"), # Colour of axis text
axis.title = element_text(face = "bold", colour = "red"), # Colour of axis title
axis.ticks = element_line(colour = "green", linewidth = 4), # Colour and size of axis ticks
axis.line = element_line(colour = "orange",linewidth = 2)) # Colour of axis line
A legend can display multiple aesthetics (e.g. colour and shape), from multiple layers, and the symbol displayed in a legend varies based on the geom used in the layer.
# Modifying the legend elements
# Theme as added as a layer
filter(gapminder, year %in% 2007) |>
ggplot() +
geom_point(mapping = aes(x = gdp_percap,
y = life_exp,
colour = continent),
size = 3) +
labs(x = "Gross Domestic Product Per Capita in International Dollars",
y = "Life expectancy at birth in years",
colour = "Year",
title = "Graph showing Life Expectancy by GDP Per Capita",
subtitle = "Data from Gapminder Dataset",
caption = "www.gapminder.org") +
# Added theme to modify plot
theme(legend.background = element_rect(fill = "deepskyblue2"), # Change fill of legend with rectangle
legend.title = element_text(colour = "white", # Change the legend Title
face = "bold"),
legend.text = element_text(colour = "blue"), # Change the legend text
legend.margin = margin(t = 10, # Legend Margin
r = 10,
b = 10,
l = 10,
unit = "pt"))
We can turn off the legend title, by adding “element.title = element_blank()”. Make sure the element you are removing isn’t necessary to understand the plot.
# Modifying the legend elements
# Turning off legend title
filter(gapminder, year %in% 2007) |>
ggplot() +
geom_point(mapping = aes(x = gdp_percap,
y = life_exp,
colour = continent),
size = 3) +
labs(x = "Gross Domestic Product Per Capita in International Dollars",
y = "Life expectancy at birth in years",
colour = "Year",
title = "Graph showing Life Expectancy by GDP Per Capita",
subtitle = "Data from Gapminder Dataset",
caption = "www.gapminder.org") +
theme(legend.title = element_blank()) # Removing the legend title with blank
Legends can appear in different places, so you need some global way of controlling them. They have considerably more details that can be tweaked:
- Should they be displayed vertically or horizontally?
- How many columns?
- How big should the keys be?
The position and justification of legends are controlled by the theme setting “legend.position”, which takes values “right”, “left”, “top”, “bottom”, or “none” (no legend).
# Modifying the legend elements
# Modifying legend position
filter(gapminder, year %in% 2007) |>
ggplot() +
geom_point(mapping = aes(x = gdp_percap,
y = life_exp,
colour = continent),
size = 3) +
labs(x = "Gross Domestic Product Per Capita in International Dollars",
y = "Life expectancy at birth in years",
colour = "Year",
title = "Graph showing Life Expectancy by GDP Per Capita",
subtitle = "Data from Gapminder Dataset",
caption = "www.gapminder.org") +
# Added theme to modify plot
theme(legend.position = "bottom") # Legend will be placed at bottom
Switching between left/right and top/bottom modifies how the keys in each legend are laid out (horizontal or vertically), and how multiple legends are stacked (horizontal or vertically). If needed, you can adjust those options independently:
legend.direction - layout of items in legends (“horizontal” or “vertical”).
legend.box - arrangement of multiple legends (“horizontal” or “vertical”).
Finally the panel elements can be modified as below.
# Modifying the panel elements
filter(gapminder, year %in% 2007) |>
ggplot() +
geom_point(mapping = aes(x = gdp_percap,
y = life_exp,
colour = continent),
size = 3) +
labs(x = "Gross Domestic Product Per Capita in International Dollars",
y = "Life expectancy at birth in years",
colour = "Year",
title = "Graph showing Life Expectancy by GDP Per Capita",
subtitle = "Data from Gapminder Dataset",
caption = "www.gapminder.org") +
# Added theme to modify plot
theme(panel.background = element_rect(fill = "lightblue"), # Changing panel background colour
panel.grid = element_line(colour = "grey60", linewidth = 0.2), # Changing the lines
panel.border = element_rect(colour = "black", fill = NA)) # Changing the border
If you are struggling with creating a theme you could use ggThemeAssist which provides an interactive user interface for creating a theme, also ggeasy, which is package that makes theme customisation much easier.
Remember that accessibility and organisational guidelines are the priority when creating publication ready plots, so despite the extra complexity at play here with themes, it is so important to get these right. You will see later that once a theme object is created, we can use it for as many plots as we like, simplifying the overall workload significantly.
5.8.3 Exercise
- Using the visualisation you created in the previous exercise:
- Set an appropriate add a theme of your choice.
<- gapminder |>
gapminder_uk filter(country %in% "United Kingdom")
# Plotting the data
|> ggplot() +
gapminder_uk geom_point(mapping = aes(x = gdp_percap,
y = fertility)) +
labs(x = "Gross Domestic Product Per Capita in International Dollars",
y = "Fertility (number of children per woman)",
title = "Graph showing Fertility by GDP Per Capita")
<- gapminder |>
gapminder_uk filter(country %in% "United Kingdom")
# Plotting the data
|> ggplot() +
gapminder_uk geom_point(mapping = aes(x = gdp_percap,
y = fertility)) +
labs(x = "Gross Domestic Product Per Capita in International Dollars",
y = "Fertility (number of children per woman)",
title = "Graph showing Fertility by GDP Per Capita") +
theme_classic()
5.8.4 Amending Default Themes
If we don’t want to define each and every argument, we also can start with an existing theme and alter only some of its arguments.
- theme_gray() - “the mother of all themes” and fully defined, for example theme_bw() builds upon theme_gray() , while theme_minimal() in turn builds on theme_bw().
# Amending set themes
filter(gapminder, year %in% 2007) |>
ggplot() +
geom_point(mapping = aes(x = gdp_percap,
y = life_exp,
colour = continent),
size = 3) +
labs(x = "Gross Domestic Product Per Capita in International Dollars",
y = "Life expectancy at birth in years",
colour = "Year",
title = "Graph showing Life Expectancy by GDP Per Capita",
subtitle = "Data from Gapminder Dataset",
caption = "www.gapminder.org") +
theme_bw() + # Added theme bw to modify plot
theme(text = element_text(colour = "red")) # Overriding the the text colour
5.8.5 Saving The Theme
We can adapt a default theme and save it to use for other plots. Let’s call it “custom_theme”, assigning it for use in other plots going forward. Establishing a consistency among the plots we create is really important for publication purposes.
# Using preset theme and edit some elements of it.
<- theme_bw() +
custom_theme # grid elements
theme(panel.grid.major = element_blank(), # Strip major gridlines
panel.grid.minor = element_blank(), # Strip minor gridlines
# Add axis line
axis.line = element_line(colour = "black", # Colour to black
linewidth = 0.5), # Set thicness
# Text elements
# Title
plot.title = element_text(size = 14, # Set font size
face = "bold", # Bold typeface
hjust = 0, # Left align
vjust = 2), # Raise slightly
# Subtitle
plot.subtitle = element_text(size = 12, # Font Size
margin = margin(t = 10) # Margin for plot text
), # Caption
plot.caption = element_text(size = 9, # Font size
hjust = 1), # Right align
# Axis titles
axis.title = element_text(size = 10), # Font size
# Axis text
axis.text = element_text(size = 10), # Font size
# Margin for axis text
axis.text.x = element_text(margin = margin(t = 5,
b = 10)))
Note that since the legend often requires manual tweaking based on the plot we are creating, we will not define it here.
To control the alignment of labels we use hjust (horizontal adjustment) and vjust (vertical adjustment).
5.8.5.1 Example
Now that we have our custom theme, we can then add it to our graph as a layer, as shown below.
# Adding our custom theme as a layer
filter(gapminder, year %in% 2007) |>
ggplot() +
geom_point(mapping = aes(x = gdp_percap,
y = life_exp,
colour = continent),
size = 3) +
labs(x = "Gross Domestic Product Per Capita in International Dollars",
y = "Life expectancy at birth in years",
colour = "Year",
title = "Graph showing Life Expectancy by GDP Per Capita",
subtitle = "Data from Gapminder Dataset",
caption = "www.gapminder.org") +
# Theme added as a layer custom_theme
5.8.6 Setting Our Theme
Now that we have created our custom theme, we can set it as the default using “theme_set”. This way of changing the plot design is highly recommended. It allows you to quickly change any element of your plots by changing it once.
# All graphs plotted will use the set theme
theme_set(custom_theme)
Of course, whilst this is fairly universal, there are some examples (such as with pie charts and donut charts) where our theme would need to be tweaked.
5.9 Combining Multiple Plots Side by Side
There are several ways how plots can be combined. We will use the patchwork package in this course, but other options you could use are Grid Extra Package or the Cow Plot Package
Patchwork, like the name suggests, just patches plots together. We don’t actually need to use functions from within the package to accomplish these combinations, mathematical symbols will perform the operations for us once we have loaded Patchwork in. You define plots however you want them to be displayed, then assign them to variable names to be used within Patchwork.
# Assigning plots names
<- filter(gapminder, year == 1987) |>
first_plot ggplot() +
geom_point(mapping = aes(x = gdp_percap,
y = life_exp)
)
<- filter(gapminder, year == 1987) |>
second_plot ggplot() +
geom_point(mapping = aes(x = gdp_percap,
y = life_exp),
colour = "red",
size = 2)
<- filter(gapminder, year == 1987) |>
third_plot ggplot() +
geom_point(mapping = aes(x = gdp_percap,
y = life_exp),
colour = "blue")
5.9.1 Examples
We can show multiple plots side by side using the addition (+) sign.
# Setting 2 plots side by side
+ second_plot first_plot
We can also put one of top of another plot, here I am also adding a title, subtitle and caption.
# Setting plot on top of another plot
/ second_plot + plot_annotation( # Adding title, subtitle and caption
first_plot title = "This is my title",
subtitle = "This is my subtitle",
caption = "This is my caption"
)
# Setting 1 plot with 2 plots beside it
| (second_plot / third_plot) first_plot
5.10 Geometric Objects
Each plot uses a different visual object to represent the data. In ggplot2 syntax, we say that they use different geoms. People often describe plots by the type of geom that the plot uses. For example, bar charts use bar geoms, line charts use line geoms, box plots use box plot geoms, and so on. Scatter plots break the trend; they use the point geom.
To change the geom in your plot, change the geom function that you add to ggplot(). For instance, to make the plots below, you can use this code:
# Bar chart example
|>
gapminder group_by(continent) |>
ggplot() +
geom_bar(mapping = aes(x = continent,
colour = continent,
fill = continent)
)
# Histogram example
ggplot(data = gapminder) +
geom_histogram(mapping = aes(x = life_exp,
fill = continent),
bins = 25
)
# Line graph example
|>
gapminder filter(country %in% c("Colombia", "Chile", "Argentina", "Brazil", "Peru", "Ecuador")) |>
ggplot() +
geom_line(mapping = aes(x = year,
y = gdp_percap,
colour = country)) +
ylim(0, 15000)
Every geom function in ggplot2 takes a mapping argument. However, not every aesthetic works with every geom. The package ggplot2 provides over 40 geoms, the best way to get a comprehensive overview is the ggplot2 cheatsheet and for individual geoms, consult the R help documentation.
5.11 Global versus local aesthetic mappings
5.11.1 Global Mapping
You can plot multiple geoms in the same graph. In order to do this, add multiple geom functions to ggplot(), repeating the mapping each time:
5.11.1.1 Example
# Plotting multiple geoms on the same graph
|>
gapminder filter(country %in% c("Colombia", "Chile", "Argentina", "Brazil", "Peru", "Ecuador")) |>
ggplot() +
geom_point(mapping = aes(x = year,
y = life_exp)
+
) geom_smooth(mapping = aes(x = year,
y = life_exp)
)
This, however, introduces some duplication in our code. Imagine if you wanted to change the y-axis to display infant mortality instead of life expectancy. You’d need to change the variable in two places, and you might forget to update one. You can avoid this type of repetition by passing a set of mappings to ggplot() directly instead of inside the geom_() layer.
ggplot2 will treat these mappings as global mappings that apply to each geom in the graph. In other words, this code will produce the same plot as the previous code:
# Plotting multiple geoms on the same graph
# Mapping argument specified within ggplot function
|>
gapminder filter(country %in% c("Colombia", "Chile", "Argentina", "Brazil", "Peru", "Ecuador")) |>
ggplot(mapping = aes(x = year,
y = life_exp)
+
) geom_point() +
geom_smooth()
This is a really important distinction worth making:
It could save on coding should you require the same mappings throughout your visualisation.
You have finer control if you wish to assign different variables at different geometric layers.
Note that some layers or functions require mappings to function correctly.
5.11.2 Local Mappings
On the contrary, if you place mappings in a geom function (like we have thus far), ggplot2 will treat them as local mappings for the layer. It will use these mappings to extend or overwrite the global mappings for that layer only. This makes it possible to display different aesthetics in different layers.
5.11.2.1 Example
|>
gapminder filter(country %in% c("Colombia", "Chile", "Argentina", "Brazil", "Peru", "Ecuador")) |>
ggplot(mapping = aes(x = year,
y = life_exp)
+
) geom_point(mapping = aes(colour = country)
+
) geom_smooth(colour = "red")
`geom_smooth()` using method = 'loess' and formula = 'y ~ x'
If you only have one layer in the plot, the way you specify aesthetics doesn’t make any difference. However, the distinction is important when you start adding additional layers.
5.12 Facets
We can create small subplots side by side, which are called facets. These are an excellent way to avoid overplotting data on one set of axes. It also makes it easier to compare data across groups of categorical variables (for example, we could have one graph per continent and so on).
There are two main functions for faceting :
facet_wrap() : This will produce a “row” of subplots, one for each categorical variable
facet_grid(): As a grid or matrix of plots.
5.12.1 Facet wrap
Let’s take a look at a similar scatterplot that we created earlier in the course.
# Plotting data
filter(gapminder, year %in% 1987) |>
ggplot() +
geom_point(mapping = aes(x = gdp_percap,
y = life_exp,
colour = continent),
size = 2)
We could break the plot above into separate plots showing a scatter plot for each continent, allowing for better comparisons across the groups.
5.12.1.1 Example
We use the facet_wrap() function which takes the vars function as an argument, where we specify which variables to wrap (or group) by. The variable that you pass to facet_wrap() should be discrete (a.k.a categorial).
By default, all of the small multiples will have the same vertical axis.
# Using facet wrap and vars()
filter(gapminder, year %in% 1987) |>
ggplot() +
geom_point(
mapping = aes(x = gdp_percap,
y = life_exp,
colour = continent),
size = 2) +
facet_wrap(vars(continent))
We can control the number of rows and columns with nrow (number of rows) and ncol (number of columns) arguments.
# Amend the number of columns
filter(gapminder, year %in% 1987) |>
ggplot() +
geom_point(mapping = aes(x = gdp_percap,
y = life_exp,
colour = continent),
size = 2 ) +
facet_wrap(vars(continent),
ncol = 2)
By default, the labels are displayed on the top of the plot. However, if we use the optional argument “strip.position=” it is possible to place the labels on either of the four sides by setting strip.position = c(“top”, “bottom”, “left”, “right”).
# Amend position of labels using strip.position
filter(gapminder, year %in% 1987) |>
ggplot() +
geom_point(mapping = aes(x = gdp_percap,
y = life_exp,
colour = continent),
size = 2) +
facet_wrap(vars(continent),
strip.position = "bottom")
5.12.2 Facet grid
If we want to create more complex facets/subplots we can use facet grid. This enables us to create facets with 2 categorical variables, allowing for much more complex visualisations. This gives way to more arguments we can use to make better use of the function:
- rows - Which variable to set as the rows
- cols - Which variable to set as the columns
5.12.2.1 Example
Notice here that the column contains vars() just like rows does, but we are using a condition on the “pop” variable, which will generate a second categorical variable, with the boolean values TRUE and FALSE to group by.
# Using facet grid
filter(gapminder, year %in% 1987) |>
ggplot() +
geom_point(mapping = aes(x = gdp_percap,
y = life_exp,
colour = continent),
size = 2) +
facet_grid(rows = vars(continent),
cols = vars(pop > 10000000)
)
There are lots of other useful parameters this function can take to tweak the output in the form we desire, for example organising by rows or columns. You can read about this in help documentation.
With one variable facet_grid produces similar output to facet_wrap():
# Using facet grid
filter(gapminder, year %in% 1987) |>
ggplot() +
geom_point(mapping = aes(x = gdp_percap,
y = life_exp,
colour = continent),
size = 2) +
facet_grid(rows = vars(continent)
)
5.13 Saving Visualisations
5.13.1 Filetypes for Images
Often when we have a finished visualisation, we want to save it for publishing or further use in documentation, ggplot can save as a variety of file formats:
- Image (Raster) Formats, such as
- png - ‘portable network graphics’
- jpg /.jpeg - ‘joint photographic experts group’
- tif /.tiff - ‘tagged image file format’
- Vector Formats, such as
- ps /.eps - Postscript/ Encapsulated Postscript
- pdf - ‘portable document format’
- svg - ‘scalable vector graphics’
Why may we care about file types?
Raster images are images we’re often most familiar with. These are made up of tiny squares; called pixels. Remembering back to the early days of digital cameras pictures often had that “squares” like quality. If we enlarge these images we can often find they look blurry; or pixelated. However they are often smaller file sizes than vector images. Software for editing Raster images includes Microsoft Paint (a classic!), Adobe Photoshop and freeware options like Gimp and Paint.net.
Vector images are instead made up of paths. This gives them the ability to be scaled up or down at will without losing any quality, or appearing pixelated. Clip-arts are a classic example of Vector images. A downside to them is that they’re often much larger file sizes and your audience may not be able to view the files if sent individually (rather than embedded in a Notebook or markdown document).
Editing these is often more complicated; and requires special software. Note that we’ve included PDF files, as ggplot creates them as vectors; but not all generic PDF files are automatically vectors. Software for editing Vector images includes Adobe Illustrator and freeware options like Inkscape.
5.13.2 How to save visualisations
You can save your plots using the ggsave() function, which will save the most recent plot you have created. You can specify the dimension and resolution of your plot by adjusting the appropriate arguments (width, height, and dpi) to create high quality graphics for publication.
It has the following important arguments:
The first argument, “path”, specifies the path where the image should be saved. The file extension will be used to automatically select the correct graphics device. ggsave() can produce .eps, .pdf, .svg, .wmf, .png, .jpg, .bmp, and .tiff files.
width and height control the output size, specified in inches. If left blank, they’ll use the size of the on-screen graphics device.
For raster graphics (i.e. .png, .jpg), the dpi argument controls the resolution of the plot. It defaults to 300, which is appropriate for most printers, but you may want to use 600 for particularly high-resolution output, or 96 for on-screen (e.g., web) display.
If you save through the export button you will typically have a low DPI (72) that has jagged edges on lines (known as aliasing), as opposed to exporting with a higher DPI which will give a higher quality appearance.
If are having issues with anti aliasing issue on windows you can use the Cairo Package
# To save plots
ggsave(path = "./outputs/output_plot.png",
width = 7,
height = 5,
dpi = 300)
Note that a 150-300 dpi is suggested resolution for power points.
5.13.3 Exercise
- Using the visualisation you created in the previous exercise:
Save it in the outputs folder as a .png file at 300 dpi.
Then save it as a PDF file, compare the two.
# Create the previous visualisation
<- gapminder |>
gapminder_uk filter(country %in% "United Kingdom")
# Plotting the data
|> ggplot() +
gapminder_uk geom_point(mapping = aes(x = gdp_percap,
y = fertility)
+
) labs(x = "Gross Domestic Product Per Capita in International Dollars",
y = "Fertility (number of children per woman)",
title = "Graph showing Fertility by GDP Per Capita") +
theme_classic()
# Save the plot
<- gapminder |>
gapminder_uk filter(country %in% "United Kingdom")
# Plotting the data
<- ggplot(data = gapminder_uk) +
my_plot geom_point(mapping = aes(x = gdp_percap,
y = fertility)
+
) labs(x = "Gross Domestic Product Per Capita in International Dollars",
y = "Fertility (number of children per woman)",
title = "Graph showing Fertility by GDP Per Capita") +
theme_classic()
# saves as png
ggsave(filename = "./outputs/my_plot.png",
width = 7,
height = 5,
dpi = 300)
# saves as pdf
ggsave(filename = "./outputs/my_plot.pdf",
width = 7,
height = 5,
dpi = 300)
6 Summary
Congratulations! You have climbed the immense mountain of Chapter 2 and conquered the various functionalities of the ggplot2 package as well as Patchwork. This was a very in depth chapter but you can now appreciate the immense depth we can go into when teaching plotting and just approaching visualisation problems in general.
Make sure you take a break before moving onto Chapters 3 and 4, you deserve it!