# Load packages
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt # pyplot module from base matplotlib (more efficient access to plotting tools)
import matplotlib # base matplotlib to access help functionality
# Load data
= pd.read_csv("../data/gapminder.csv") gapminder
Chapter Two- Plotting Overview
1 Packages and Data
Let’s start by loading packages and data.
Remember you can use the .__version__
attribute (e.g np.__version__
) to check your version against the minimum requirements in chapter 1.
We recommend that you follow standard convention for nicknames.
First set the following magic command
% matplotlib inline
%matplotlib inline
This means any plot you create will be automatically embedded below the code cell upon execution.
3 Scatter Plots
In this section you’ll explore creating a basic plot using pyplot. You’ll do this with just one kind of plot - the scatter plot. Scatter Plots are a fantastic way to develop and understanding of the foundations of plotting in Python.
By the end of this section we’ll have built our scatter plot to meet GSS standards and have a good core understanding of how the plotting tools in Matplotlib work.
3.1 Matplotlib Objects
We have several types of object within Matplotlib:
3.1.1 The Figure
Object
The outermost container for a plot. This contains a minimum of one axes
object – but can contain multiples axes
objects. The figure
object contains and manages other objects associated with plots, like titles and legends.
3.1.2 The Axes
object
Represents the main plot itself. The Axes is made up of many objects which are used to define what’s represented in the plot. There are many axes
methods we can use to create different plots and we can use our axes
object to set things like axis labels, ticks and ranges.
Matplotlib graphs our data on Figure
objects, each of which contains one or more Axes
object.
For most of this course we’ll concentrate on one figure
and one axes
object; but will explore multiple plots briefly at the end of this section.
3.1.3 Other objects
There are other objects, such as locator objects, formatter objects and artist objects. Most of the time we can do everything we need through an axes object though.
3.2 Creating a Plot
To create our figure and axes we use the code:
# Create our figure and our axes
= plt.subplots() figure, axes
As you can see this creates the figure
object , which contains our single axes
object.
We can customise the size of this plot by specifying the figsize
parameter.
This takes a tuple – (width, height) and is measured in inches. The default is (6.4, 4.8).
# Create our figure and our axes
= plt.subplots(figsize=(8, 6)) # NEW - Set figure size figure, axes
# Create a new Dataset for 1987 and reset the index.
= gapminder[gapminder["year"] == 1987]
gm_1987 = True, inplace = True) gm_1987.reset_index(drop
Feel free to run the cell below to inspect this new dataset.
gm_1987.head()
country | continent | year | life_exp | pop | gdp_per_cap | infant_mortality | fertility | |
---|---|---|---|---|---|---|---|---|
0 | Afghanistan | Asia | 1987 | 40.822 | 13867957.0 | 852.395945 | NaN | NaN |
1 | Albania | Europe | 1987 | 72.000 | 3075321.0 | 3738.932735 | 40.8 | 3.13 |
2 | Algeria | Africa | 1987 | 65.799 | 23254956.0 | 5681.358539 | 46.3 | 5.51 |
3 | Angola | Africa | 1987 | 39.906 | 7874230.0 | 2430.208311 | 134.1 | 7.20 |
4 | Argentina | Americas | 1987 | 70.774 | 31620918.0 | 9139.671389 | 27.1 | 3.06 |
We always start our plots by creating the figure
and axes
objects.
We can then create the contents of our plot using a method from the axes
object.
To create a scatter plot we use axes.plot()
Here we’re passing two arguments, effectively:
axes.plot(x,y)
# Create our figure and our axes
= plt.subplots(figsize=(6, 5))
figure, axes
# NEW - Plot the Data
=gm_1987["gdp_per_cap"], y=gm_1987["life_exp"]) axes.scatter(x
A semi colon ( ; ) at the end of the last line suppresses the output line – here <matplotlib.collections.PathCollection at 0x1c343226d30>
which we often do not want with our visualisation.
# Create our figure and our axes
= plt.subplots(figsize=(6, 5))
figure, axes
# Plot the Data
=gm_1987["gdp_per_cap"], y=gm_1987["life_exp"]); # NEW - added ; to end of line axes.scatter(x
3.2.1 Exercise 1
Filter the Gapminder data for the United Kingdom.
Make a scatter plot with:
gdp_per_cap as x axis
fertility as y axis
# Create a United Kingdom DataFrame
= gapminder[gapminder["country"] == "United Kingdom"]
UK_gapminder
# Create our figure and our axes
= plt.subplots(figsize=(6, 5))
figure, axes
# Plot the Data
=UK_gapminder["gdp_per_cap"], y=UK_gapminder["fertility"]); axes.scatter(x
3.3 Basic Titles and Labels
We can set our titles and labels from the axes
object. These come under the .set_
family of methods.
# Create our figure and our axes
= plt.subplots(figsize=(6, 5))
figure, axes
# Plot the Data
=gm_1987["gdp_per_cap"], y=gm_1987["life_exp"])
axes.scatter(x
# NEW - Add Labels and titles
"Graph showing Life Expectancy by GDP per Capita")
axes.set_title("Gross Domestic Product per Capita in International Dollars")
axes.set_xlabel("Life expectancy at birth in years"); axes.set_ylabel(
3.3.1 Exercise 2
Use the visualisation you created in the previous exercise, and set an appropriate title, and X and Y axis labels.
# Create our figure and our axes
= plt.subplots(figsize=(6, 5))
figure, axes
# Plot the Data
=UK_gapminder["gdp_per_cap"], y=UK_gapminder["fertility"])
axes.scatter(x
# NEW - Add Labels and titles
"Graph showing Fertility by GDP Per Capita")
axes.set_title("Gross Domestic Product per Capita in International Dollars")
axes.set_xlabel("Fertility (number of children per woman)"); axes.set_ylabel(
4 Customising Plots to Analysis Function Guidelines
Often the default behaviour of plotting packages will not create outputs that meet our style guides or specifications.
As mentioned previously we’re following the guidance set out by Style.ONS and the GSS Good Practice Team.
If your department has different guidelines, you should follow those.
4.1 Fonts
In this section we’ll talk about fonts in Matplotlib.
We’ll address these individually in this section; we can also set them globally to save time. We’ll talk about that later in this chapter.
Guidance recommends using a single accessible sans serif font. Examples of these fonts are Arial and Tahoma.
We can alter the font individually in the labels and title by using the parameter font
and supplying a font name in speech marks.
In the cell below I have set the title to be “Arial” in line 9.
= gapminder[gapminder["year"] == 1987]
gm_1987 = True, inplace = True)
gm_1987.reset_index(drop
# Create our figure and our axes
= plt.subplots(figsize=(6, 5))
figure, axes
# Plot the Data
=gm_1987["gdp_per_cap"], y=gm_1987["life_exp"])
axes.scatter(x
# Add Labels and titles
"Graph showing Life Expectancy by GDP per Capita",
axes.set_title(="Arial") # NEW - Change title font
fontname"Gross Domestic Product per Capita in International Dollars")
axes.set_xlabel("Life expectancy at birth in years"); axes.set_ylabel(
If a font does not exist then we get the user warning that the font is not found.
It will then fall back to the default font – on my system, it is “DejaVu Sans”.
# Create our figure and our axes
= plt.subplots(figsize=(6, 5))
figure, axes
# Plot the Data
=gm_1987["gdp_per_cap"], y=gm_1987["life_exp"])
axes.scatter(x
# Add Labels and titles
"Graph showing Life Expectancy by GDP per Capita",
axes.set_title(="Open Sans") # NEW - Change title font
fontname"Gross Domestic Product per Capita in International Dollars")
axes.set_xlabel("Life expectancy at birth in years"); axes.set_ylabel(
findfont: Font family 'Open Sans' not found.
findfont: Font family 'Open Sans' not found.
findfont: Font family 'Open Sans' not found.
findfont: Font family 'Open Sans' not found.
findfont: Font family 'Open Sans' not found.
findfont: Font family 'Open Sans' not found.
findfont: Font family 'Open Sans' not found.
findfont: Font family 'Open Sans' not found.
4.1.1 Setting Font Sizes
We can set font sizes individually from within the set_
methods.
We use the parameter size
and pass an integer. Here we’ll use 16 for our title and 12 for our x and y labels.
Note this does not alter the font size for the numbers on either axis.
# Create our figure and our axes
= plt.subplots(figsize=(6, 5))
figure, axes
# Plot the Data
=gm_1987["gdp_per_cap"], y=gm_1987["life_exp"])
axes.scatter(x
# Add Labels and titles
"Graph showing Life Expectancy by GDP per Capita",
axes.set_title(="Arial",
fontname=16) # NEW - Set font size
size"Gross Domestic Product per Capita in International Dollars",
axes.set_xlabel(="Arial", size=12) # NEW - Set font name as before, and size
fontname"Life expectancy at birth in years",
axes.set_ylabel(="Arial", size=12); # NEW - Set font name as before, and size fontname
4.1.2 Changing other text attributes
We can find out parameters we can set to text using the help function from base matplotlib imported earlier - ?matplotlib.text.Text
?matplotlib.text.Text
You’ll notice that we can set a huge variety of options; from the sensible (bolding) through to the bizarre, like adding a background colour for our title.
Here we’re introduced to one of the really important elements of visualisation journey:
With great power comes great responsibility!
You should always think, does this add to my visualisation or detract from it?
Here I’ve picked out a few parameters you may find useful and applied them to the title in the code below:
# Guidance says that bold fonts should be limited to headings and italics should not be used at all.
# Create our figure and our axes
= plt.subplots(figsize=(6, 5))
figure, axes
# Plot the Data
=gm_1987["gdp_per_cap"], y=gm_1987["life_exp"])
axes.scatter(x
# Add Labels and titles
"Graph showing Life Expectancy by GDP per Capita",
axes.set_title(="bold",
weight="gray",
color=16,
size="left", # Note - the aligment here is wrong!
horizontalalignment="Arial") # NEW - Altered parameters
fontname
"Gross Domestic Product per Capita in International Dollars", fontname="Arial", size=12)
axes.set_xlabel("Life expectancy at birth in years", fontname="Arial", size=12); axes.set_ylabel(
4.1.3 Exercise 1
Using the visualisation you created in the previous exercise:
Set the font to be Arial for the title, x and y axis labels.
Set the title to be 16 and the x and y axis labels to be 14.
# Solution - These cells contain answers for the exercises.
= plt.subplots(figsize=(6, 5))
figure, axes
# Plot the Data
=UK_gapminder["gdp_per_cap"], y=UK_gapminder["fertility"])
axes.scatter(x
# Add Labels and titles
"Graph showing Fertility by GDP Per Capita", fontname="Arial", size=16)
axes.set_title("Gross Domestic Product per Capita in International Dollars", fontname="Arial", size=12)
axes.set_xlabel("Fertility (number of children per woman)", fontname="Arial", size=12); axes.set_ylabel(
4.2 Colours
Colour is a fundamental part of our visualisations. Used poorly, it can confuse the user – used well it can enhance and clarify statistical content.
The resources linked below go into much greater detail about colour theory, and tips about using colours and make for very interesting reading.
Matplotlib has a huge variety of ways to specify colour. Here we’ll specify just a few, but you can find additional ways to specify colours on the tutorial on the Matplotlib website.
N.B – The explanatory text will spell colour with the UK spelling.
Matplotlib uses the American spelling for the parameter: color
4.2.1 Colour by Names
You apply colours to your scatter plot using the parameter below:
="blue" color
This turns the points on my scatter plot to blue.
# Create our figure and our axes
= plt.subplots(figsize=(6, 5))
figure, axes
# Plot the Data
=gm_1987["gdp_per_cap"], y=gm_1987["life_exp"],
axes.scatter(x="blue") # NEW - Colour specified using a colour name.
color
# Add Labels and titles
"Graph showing Life Expectancy by GDP per Capita",
axes.set_title(="Arial", size=16)
fontname"Gross Domestic Product per Capita in International Dollars", fontname="Arial", size=12)
axes.set_xlabel("Life expectancy at birth in years", fontname="Arial", size=12); axes.set_ylabel(
4.2.2 Colour by HTML Names and HEX codes
Matplotlib will accept most common names, and HTML colour names - here I’m using color="OliveDrab"
# Create our figure and our axes
= plt.subplots(figsize=(6, 5))
figure, axes
# Plot the Data
=gm_1987["gdp_per_cap"], y=gm_1987["life_exp"],
axes.scatter(x="OliveDrab") # NEW - Colour specified using HTML colour names
color
# Add Labels and titles
"Graph showing Life Expectancy by GDP per Capita",
axes.set_title(="Arial",
fontname=16)
size"Gross Domestic Product per Capita in International Dollars", fontname="Arial", size=12)
axes.set_xlabel("Life expectancy at birth in years", fontname="Arial", size=12); axes.set_ylabel(
For finer control we can also use hex codes; these are given as a string with a # symbol at the front.
Here “#9E2A2B” is a burnt red colour.
# Create our figure and our axes
= plt.subplots(figsize=(6, 5))
figure, axes
# Plot the Data
=gm_1987["gdp_per_cap"], y=gm_1987["life_exp"],
axes.scatter(x="#9E2A2B") # NEW - Colour specified using HEX code
color
# Add Labels and titles
"Graph showing Life Expectancy by GDP per Capita",
axes.set_title(="Arial",
fontname=16)
size"Gross Domestic Product per Capita in International Dollars", fontname="Arial", size=12)
axes.set_xlabel("Life expectancy at birth in years", fontname="Arial", size=12); axes.set_ylabel(
4.2.3 Colour by Greyscale
For greyscale we can also use a decimal between 0.0 (white) and 1.0 (black) to select a grey colour.
Note this needs to be a string.
# Create our figure and our axes
= plt.subplots(figsize=(6,5))
figure, axes
# Plot the Data
=gm_1987["gdp_per_cap"], y=gm_1987["life_exp"],
axes.scatter(x="0.5") # NEW - Grey colour as a float
color
# Add Labels and titles
"Graph showing Life Expectancy by GDP per Capita",
axes.set_title(="Arial", size=16)
fontname"Gross Domestic Product per Capita in International Dollars", fontname="Arial", size=12)
axes.set_xlabel("Life expectancy at birth in years", fontname="Arial", size=12); axes.set_ylabel(
4.2.4 Colour by RGB and RGBA values
We can also set colours by the RGB value. This is the Red, Green and Blue value.
This is on a scale of 0 to 255 for each value, and Matplotlib accepts them as a tuple of float (decimal) values.
For example ONS Blue has the RGB Value 0, 61, 89.
Focusing on the Green value to convert it to a decimal we need to do 61/255 = 0.23921568627450981
So we can say our decimal values for ONS Blue should be (0 , 0.239 , 0.349 )
Here I’ve chosen values to the precision of the 3rd digit, differences beyond that should be not visible to the human eye.
This also works for RGBA colours; Red, Green, Blue, Alpha (transparency). The Alpha is between 0 (transparent) and 1 (solid)
# Create our figure and our axes
= plt.subplots(figsize=(6, 5))
figure, axes
# Plot the Data
=gm_1987["gdp_per_cap"], y=gm_1987["life_exp"],
axes.scatter(x=(0, 0.239, 0.349)) # NEW - Colour specified using RGB Values
color
# Add Labels and titles
"Graph showing Life Expectancy by GDP per Capita",
axes.set_title(="Arial", size=16)
fontname"Gross Domestic Product per Capita in International Dollars", fontname="Arial", size=12)
axes.set_xlabel("Life expectancy at birth in years", fontname="Arial", size=12); axes.set_ylabel(
Just to further demonstrate the difference between a correct and an incorrect RGB value – see the visualisations below:
- The left uses the incorrect assumption that the RGB values can be directly coded to decimals
- The right uses the properly converted methods.
More detail will be gone into later about how exactly two side by side plots have been created.
# Create our figure and our axes
= plt.subplots(1, 2, figsize=(10, 4))
figure, (axes1, axes2)
# The line below allows us to create a title for both charts
"Incorrectly (left) and correctly (right) specifying RGB colours")
figure.suptitle(
# Plot 1
=gm_1987["gdp_per_cap"], y=gm_1987["life_exp"],
axes1.scatter(x=(0, 0.61, 0.89))
color"Gross Domestic Product per Capita in International Dollars")
axes1.set_xlabel("Life expectancy at birth in years")
axes1.set_ylabel(
# Plot 2
=gm_1987["gdp_per_cap"], y=gm_1987["life_exp"],
axes2.scatter(x=(0, 0.239, 0.349))
color"Gross Domestic Product per Capita in International Dollars")
axes2.set_xlabel("Life expectancy at birth in years"); axes2.set_ylabel(
4.2.5 Setting Alpha
We can use the alpha
parameter to control the transparency of our points. This is a float value between 0 and 1; where 0 is fully transparent (invisible) and 1 is completely opaque (solid).
# Create our figure and our axes
= plt.subplots(figsize=(6, 5))
figure, axes
# Plot the Data
=gm_1987["gdp_per_cap"], y=gm_1987["life_exp"],
axes.scatter(x=(0.0, 0.239, 0.349),
color=0.5) # NEW - Alpha controlling transparency.
alpha
# Add Labels and titles
"Graph showing Life Expectancy by GDP per Capita",
axes.set_title(= "Arial", size = 16)
fontname"Gross Domestic Product per Capita in International Dollars", fontname="Arial", size=12)
axes.set_xlabel("Life expectancy at birth in years", fontname="Arial", size=12); axes.set_ylabel(
4.2.6 Exercise 2
Using the visualisation you created in the previous exercise:
Set the colour of the points to ONS Green – Hex #a5cd46 or an RGB value of (165,205,70). Set the alpha to be 0.9.
# Create our figure and our axes
= plt.subplots(figsize=(6, 5))
figure, axes
# Plot the Data
=UK_gapminder["gdp_per_cap"], y=UK_gapminder["fertility"],
axes.scatter(x= "#A5CD46", # NEW - Add color and alpha values
color=0.9)
alpha
# Add Labels and titles
"Graph showing Fertility by GDP per Capita", fontname="Arial", size=16)
axes.set_title("Gross Domestic Product per Capita in International Dollars", fontname="Arial", size=12)
axes.set_xlabel("Fertility (number of children per woman)", fontname="Arial", size=12); axes.set_ylabel(
4.2.7 Colouring points using data - continuous
We may sometimes want to colour points on our scatter plots based on a category or other data.
This isn’t terribly straight forward in Matplotlib; and is often a reason people choose to use Seaborn for this kind of visualisation.
In our ax.scatter()
call we can use the parameters:
c
and pass a column. Note this is a different parameter than color!cmap
takes a colormap name. You can find a list of the colormap names here.
You can find guidance on making your own colormaps here
ONS and GSS Guidance for colourmaps mentions that many visualisations are still printed in greyscale to save ink. Choosing a single colour palette can help with this. Blues is a commonly used palette across government figures.
# Create our figure and our axes
= plt.subplots(figsize=(6, 5))
figure, axes
# Plot the Data
=gm_1987["gdp_per_cap"], y=gm_1987["life_exp"],
axes.scatter(x=gm_1987["life_exp"], # NEW - Colour by column
c="Blues") # NEW - Select the Blues colormap
cmap
# Add Labels and titles
"Graph showing Life Expectancy by GDP per Capita",
axes.set_title(="Arial", size=16)
fontname"Gross Domestic Product per Capita in International Dollars", fontname="Arial", size=12)
axes.set_xlabel("Life expectancy at birth in years", fontname="Arial", size=12); axes.set_ylabel(
We can add an attribute known as a colorbar
here to demonstrate the scale of our colours.
This won’t work unless we assign our plot to a variable – the color bar is called from the figure
object.
Note that in the code below that my scatterplot is assigned to the variable plot
.
There are a lot of parameters we’re covering here. You may want to make a copy of this cell and observe what happens when you remove certain arguments.
cbar = figure.colorbar()
The colorbar()
method is called from the figure
object. Note I’ve assigned it to the variable cbar
here – this allows us to set labels easier.
mappable
refers to the object we’re applying the colorbar too; hereplot
.ax
is the axes we’re applying this too, ours is calledaxes
ticks
takes a list of custom tick values for our colorbar. I’ve set these up at the top of the plot usingnp.arrange()
- that code could be applied directly to the argument; but it’s easier to read when assigned to a variable in my opinion.extend
allows us to have pointed ends of our bars. As life expectancy could be less than or exceed our range I’ve set this toboth
.
cbar.ax.get_yaxis.labelpad()
This code is adding a little padding to our label – without this it sits over our ticks.
cbar.set_label()
label
is our labelrotation
moves the label. I feel it is more readable in this direction.
# NEW - Set up some custom ticks for the colorbar
= np.arange(gm_1987["life_exp"].min(), (gm_1987["life_exp"].max()+1), step=6)
custom_ticks
# Create our figure and our axes
= plt.subplots(figsize=(6, 5))
figure, axes
# Plot the Data - NEW - I've assigned the visualisation to the variable plot
= axes.scatter(x=gm_1987["gdp_per_cap"], y=gm_1987["life_exp"],
plot =gm_1987["life_exp"], cmap="Blues")
c
# Add Labels and titles
"Graph showing Life Expectancy by GDP per Capita",
axes.set_title(="Arial", size=16)
fontname"Gross Domestic Product per Capita in International Dollars", fontname="Arial", size=12)
axes.set_xlabel("Life expectancy at birth in years", fontname="Arial", size=12)
axes.set_ylabel(
# NEW - Add the Colorbar
= figure.colorbar(mappable=plot, ax=axes, ticks=custom_ticks, extend="both")
cbar
# NEW - Put a bit more space between the colorbar y axis and the label
= 15
cbar.ax.get_yaxis().labelpad
# NEW -Set the label of the colorbar and rotate it.
="Life Expectancy", rotation=270) cbar.set_label(label
4.2.8 Colouring points using data - categorical
This above method is great if we wish to colour by a sequential numerical column. However, what if we want to colour by a categorical (or factor) variable like continent?
Packages like Seaborn make this step easy. We’ll cover the Seaborn method in chapter 6, which is dedicated to that package.
Matplotlib makes it a bit fiddlier; but not impossible.
Passing a string column or a categorical column to the c
argument doesn’t work. There are ways around this by mapping each category to a numerical value and then plotting those; adjusting the colorbar to represent static values and re-labelling; but they’re complicated, time consuming and not particularly readable.
Instead we can use the unique().tolist()
method to give us a list of the unique values in our continent column.
I’ve also sorted this column so our values are in alphabetical order.
We can then use this to loop through and plot data for each continent. So on the first iteration the loop will plot all the data points for Asia, on the second Africa etc.
The parameter label
and then our looping variable will allow us to have a labelled legend at the end; vital for knowing which colour is which continent.
We display the legend by using plt.legend()
- we’ll look at more customisation of legends later.
# NEW - Set Up
= gm_1987["continent"].unique().tolist() # Get a list with the unique values of the continent column.
continents # Sorting this list to alphabetical order
continents.sort()
# Create our figure and our axes
= plt.subplots(figsize=(6, 5))
figure, axes
# NEW - Loop through each continent in turn
for continent_name in continents:
# NEW - Get the subset of our data corresponding to the continent specified.
= gm_1987[gm_1987["continent"] == continent_name]
continent_rows
=continent_rows["gdp_per_cap"],
axes.scatter(x=continent_rows["life_exp"],
y=continent_name) # NEW - Gives each continent a label for our legend
label
# Add Labels and titles
"Graph showing Life Expectancy by GDP per Capita",
axes.set_title(="Arial", size=16)
fontname"Gross Domestic Product per Capita in International Dollars", fontname="Arial", size=12)
axes.set_xlabel("Life expectancy at birth in years", fontname="Arial", size=12)
axes.set_ylabel(
# NEW - Show the Legend
; plt.legend()
This uses an automatically generated palette; but we can adapt this loop to set our own colour palette.
Here I’ve created a list with my colour palette in hex codes. I generated this using the “Coolors” website.
The variable my_palette
contains a list of the colours I wish to loop through. This list has the same number of elements as my continents list; but it still works if it’s longer.
I’m using the zip()
function to make an iterable from the lists. I can then use this in my loop.
# Set Up
= gm_1987["continent"].unique().tolist() # Create a list with each unique value in continent column
continents # Sort this list in alphabetical order
continents.sort() = ["#335C67" ,"#F5D000", "#E09F3E", "#9E2A2B", "#540B0E"] # NEW - Create a list of colours desired
my_palette
# NEW - zips together our continent names and palette colours into one list of tuples
= zip(continents, my_palette)
continent_palette
# Create our figure and our axes
= plt.subplots(figsize=(6, 5))
figure, axes
# Loop through each continent and each colour in turn
for continent_name, colour in continent_palette:
# Get only the data for the continent we are looking at
= gm_1987[gm_1987["continent"] == continent_name]
continent_rows
=continent_rows["gdp_per_cap"],
axes.scatter(x=continent_rows["life_exp"],
y=colour, # NEW - Use corresponding colour from continent_palette
c=continent_name) # Gives each continent a label for our legend
label
# Add Labels and titles
"Graph showing Life Expectancy by GDP per Capita",
axes.set_title(="Arial", size=16)
fontname"Gross Domestic Product per Capita in International Dollars", fontname="Arial", size=12)
axes.set_xlabel("Life expectancy at birth in years", fontname="Arial", size=12)
axes.set_ylabel(
; # Show the Legend plt.legend()
This technique could be expanded upon by giving each continent a different marker, and many of the other things we can customise in a scatter plot. We’ll revisit this plot later in the course for specific details.
We can also use a qualitative cmap here too; this is done through the axes.set_prop_cycle()
method. Here I’ve used “Paired” as my cmap.
# Set Up
= gm_1987["continent"].unique().tolist() # create a list with each unique value in continent column
continents # Sort this list in alphabetical order
continents.sort()
# Create our figure and our axes
= plt.subplots(figsize=(6, 5))
figure, axes
# NEW - Set our colour cycle (N.B needs to be a qualitative cmap)
"color", plt.cm.Paired.colors))
axes.set_prop_cycle(plt.cycler(
# Iterate only over the names we want to plot (instead of a combined color object)
for continent_name in continents:
= gm_1987[gm_1987["continent"] == continent_name] # get only the data for the continent we are looking at
continent_rows
=continent_rows["gdp_per_cap"],
axes.scatter(x=continent_rows["life_exp"],
y=continent_name)
label
# Add Labels and titles
"Graph showing Life Expectancy by GDP per Capita",
axes.set_title(="Arial", size=16)
fontname"Gross Domestic Product per Capita in International Dollars", fontname="Arial", size=12)
axes.set_xlabel("Life expectancy at birth in years", fontname="Arial", size=12)
axes.set_ylabel(
; # Show the Legend plt.legend()
4.3 Gridlines
ONS Standardised gridline colour is RGB value 190, 190, 190.
In the decimals Matplotlib needs that’s about (0.745, 0.745, 0.745). Check the section above for how we calculate this. This is also the hexadecimal code #BEBEBE for convenience.
We use the .grid()
method from the axes.
b
stands for Boolean, and controls if the gridlines are turned on -True
, or off -False
. The default behaviour isFalse
.which
controls which gridlines are turned on - this can be set tomajor
,minor
orboth
.
We can also add in additional key word arguments to control things like colour - here set to ONS’ standard value.
# Create our figure and our axes
= plt.subplots(figsize=(6, 5))
figure, axes
# Plot the Data
=gm_1987["gdp_per_cap"], y=gm_1987["life_exp"],
axes.scatter(x=(0.0, 0.239, 0.349)) # Colour specified using RGB Values
color
# Add Labels and titles
"Graph showing Life Expectancy by GDP per Capita",
axes.set_title(="Arial", size=16)
fontname"Gross Domestic Product per Capita in International Dollars", fontname="Arial", size=12)
axes.set_xlabel("Life expectancy at birth in years", fontname="Arial", size=12)
axes.set_ylabel(
# NEW - Set the Gridlines
=True , which="major", color=(0.745, 0.745, 0.745)); axes.grid(visible
The default behaviour is for the grid lines to appear in front of the data points.
We can change this by using the method:
axes.set_axisbelow(True)
# Create our figure and our axes
= plt.subplots(figsize=(6, 5))
figure, axes
# Plot the Data
=gm_1987["gdp_per_cap"], y=gm_1987["life_exp"],
axes.scatter(x=(0.0, 0.239, 0.349)) # Colour specified using RGB Values
color
# Add Labels and titles
"Graph showing Life Expectancy by GDP per Capita",
axes.set_title(="Arial", size=16)
fontname"Gross Domestic Product per Capita in International Dollars", fontname="Arial", size=12)
axes.set_xlabel("Life expectancy at birth in years", fontname="Arial", size=12)
axes.set_ylabel(
# Gridlines
=True, which="major", color=(0.745, 0.745, 0.745))
axes.grid(visibleTrue); # NEW - Set gridlines behind data points axes.set_axisbelow(
We can also use axes.set_frame_on(False)
to remove the outer frame around our visualisation.
# Create our figure and our axes
= plt.subplots(figsize=(6, 5))
figure, axes
# Plot the Data
=gm_1987["gdp_per_cap"], y=gm_1987["life_exp"],
axes.scatter(x=(0.0, 0.239, 0.349)) # Colour specified using RGB Values
color
# Add Labels and titles
"Graph showing Life Expectancy by GDP per Capita",
axes.set_title(="Arial", size=16)
fontname"Gross Domestic Product per Capita in International Dollars", fontname="Arial", size=12)
axes.set_xlabel("Life expectancy at birth in years", fontname="Arial", size=12)
axes.set_ylabel(
# Set the Gridlines
=True, which="major", color=(0.745, 0.745, 0.745))
axes.grid(visibleFalse); # NEW - Removes the frame axes.set_frame_on(
We can also set just horizontal gridlines by using the method:
axes.yaxis.grid(True)
We can control other parameters as well here – such as colour.
Of course; if you wish to just display vertical gridlines axes.xaxis.grid(True)
also exists.
We’re also using axes.set_axisbelow(True)
to place the points in front of the gridlines.
# Create our figure and our axes
= plt.subplots(figsize=(6, 5))
figure, axes
# Plot the Data
=gm_1987["gdp_per_cap"], y=gm_1987["life_exp"],
axes.scatter(x=(0.0, 0.239, 0.349)) # Colour specified using RGB Values
color
# Add Labels and titles
"Graph showing Life Expectancy by GDP per Capita",
axes.set_title(="Arial", size=16)
fontname"Gross Domestic Product per Capita in International Dollars", fontname="Arial", size=12)
axes.set_xlabel("Life expectancy at birth in years", fontname="Arial", size=12)
axes.set_ylabel(
# Gridlines
True, color=(0.745, 0.745, 0.745)) # NEW - set only the Y gridlines
axes.yaxis.grid(False)
axes.set_frame_on(True); axes.set_axisbelow(
For some visualisations when the color
parameter in gridlines is set it overwrites and ignores that we’ve just set one axis of gridlines to display. A hacky solution is to set one set of gridlines to white (our background colour) and one set to the GSS gridlines colour. This feels like a bug, and may not exist in your version of the packages; but you may see it set as below through the rest of the course.
# Create our figure and our axes
= plt.subplots(figsize=(6, 5))
figure, axes
# Plot the Data
=gm_1987["gdp_per_cap"], y=gm_1987["life_exp"],
axes.scatter(x=(0.0, 0.239, 0.349)) # Colour specified using RGB Values
color
# Add Labels and titles
"Graph showing Life Expectancy by GDP per Capita",
axes.set_title(="Arial", size=16)
fontname"Gross Domestic Product per Capita in International Dollars", fontname="Arial", size=12)
axes.set_xlabel("Life expectancy at birth in years", fontname="Arial", size=12)
axes.set_ylabel(
# Gridlines
= True , which = "both", axis = "x", color = "white") # NEW - Without this both grid lines are visible (bug?)
axes.grid(visible = True , which = "both", axis = "y", color = (0.745, 0.745, 0.745))
axes.grid(visible False)
axes.set_frame_on(True); axes.set_axisbelow(
4.3.1 Exercise 3
Using the visualisation you created in the previous exercise:
Set Gridlines using the RGB value 190, 190, 190 as above.
Remove the outer frame.
Ensure the data points are in front of the grid.
# Create our figure and our axes
= plt.subplots(figsize=(6, 5))
figure, axes
# Plot the Data
=UK_gapminder["gdp_per_cap"], y=UK_gapminder["fertility"],
axes.scatter(x= "#A5CD46",
color=0.9)
alpha
# Add Labels and titles
"Graph showing Fertility by GDP per Capita", fontname="Arial", size=16)
axes.set_title("Gross Domestic Product per Capita in International Dollars", fontname="Arial", size=12)
axes.set_xlabel("Fertility (number of children per woman)", fontname="Arial", size=12)
axes.set_ylabel(
# NEW - Axes colour, line visibility
True, color=(0.745, 0.745, 0.745))
axes.yaxis.grid(False)
axes.set_frame_on(True); axes.set_axisbelow(
4.4 Axes and Ticks
4.4.1 Axes
It’s important to note that so far Matplotlib has used default axis scales.
Compare the two visualisations below.
The left uses the automatically generated limits for axis based on the range of our data, with the right a custom range set using the set_ylim
method. Can you see the impact by just having a quick glance at the two?
The code used to create the visualisations is also below - we’ll explain how to create multiple plots later.
# Create our figure and our axes
= plt.subplots(1, 2, figsize=(10, 4))
figure, (axes1, axes2)
# The line below allows us to create a title for both charts
"Visual difference between automatically imputed y axis (left) and setting left axis to 0 (right)", y = 1.01)
figure.suptitle(
# Plot 1
=gm_1987["gdp_per_cap"], y=gm_1987["life_exp"])
axes1.scatter(x"Gross Domestic Product per Capita in International Dollars")
axes1.set_xlabel("Life expectancy at birth in years")
axes1.set_ylabel("With Automatical Scaling axis set to 0", fontname="Arial", size=12)
axes1.set_title(=True, which = "major", axis = "x", color = "white")
axes1.grid(visible=True, which = "major", axis = "y", color = (0.745, 0.745, 0.745))
axes1.grid(visibleFalse)
axes1.set_frame_on(True)
axes1.set_axisbelow(
# Plot 2
=gm_1987["gdp_per_cap"], y=gm_1987["life_exp"])
axes2.scatter(x=0) # This code sets the y axis bottom value to 0.
axes2.set_ylim(bottom"Gross Domestic Product per Capita in International Dollars")
axes2.set_xlabel("Life expectancy at birth in years")
axes2.set_ylabel("With Y axis set to 0", fontname="Arial", size=12)
axes2.set_title(=True, which = "major", axis = "x", color = "white")
axes2.grid(visible=True, which = "major", axis = "y", color = (0.745, 0.745, 0.745))
axes2.grid(visibleFalse)
axes2.set_frame_on(True); axes2.set_axisbelow(
We can set the y axis to 0 by using axes.set_ylim()
method.
We set the value bottom
to 0.
Guidance states that we should always aim to have our axis starting at 0 when possible.
# Create our figure and our axes
= plt.subplots(figsize=(6, 5))
figure, axes
# Plot the Data
=gm_1987["gdp_per_cap"], y=gm_1987["life_exp"],
axes.scatter(x=(0, 0.239, 0.349)) # Colour specified using RGB Values
color
# NEW - Set the Y axis lower limit to 0
=0)
axes.set_ylim(bottom
# Add labels and titles
"Graph showing Life Expectancy by GDP per Capita",
axes.set_title(="Arial", size=16)
fontname"Gross Domestic Product per Capita in International Dollars", fontname="Arial", size=12)
axes.set_xlabel("Life expectancy at birth in years", fontname="Arial", size=12)
axes.set_ylabel(
# Set gridlines and colours
True, color="white")
axes.xaxis.grid(True, color=(0.745, 0.745, 0.745))
axes.yaxis.grid(False)
axes.set_frame_on(True); axes.set_axisbelow(
4.4.2 Ticks
“Ticks” is the word Matplotlib uses to refer to the markers on our axes; so far we’ve been happy to use the default values for these, with the exception of resetting our y axes to 0.
However we may sometimes need to manipulate these to provide a better visualisation.
We can manually set our ticks using the axes.set_xticks()
and axes.set_yticks()
methods.
We pass a list or an array to this method. It’s good to create these at the top of the visualisation and assign them to variables. You may want to adjust the steps later; and having them separate and easily identifiable makes this easier.
In lines 2 and 3 numpy.arange()
is used to get an array between 0 and the maximum + 1 of my column. My step depends on which variable is being dealt with.
In lines 13 and 14 the np.array
objects created are being passed to the axes.set_
methods.
# NEW Create variables for the ticks.
= np.arange(start=0, stop=(gm_1987["gdp_per_cap"].max() + 1), step=10000)
my_xticks = np.arange(start=0, stop=(gm_1987["life_exp"].max() + 1), step=5)
my_yticks
# Create our figure and our axes
= plt.subplots(figsize=(6, 5))
figure, axes
# Plot the Data
=gm_1987["gdp_per_cap"], y=gm_1987["life_exp"],
axes.scatter(x=(0, 0.239, 0.349)) # Colour specified using RGB Values
color
# NEW Set the x and y ticks
axes.set_xticks(my_xticks)
axes.set_yticks(my_yticks)
# Add Labels and titles
"Graph showing Life Expectancy by GDP per Capita",
axes.set_title(="Arial", size=16)
fontname"Gross Domestic Product per Capita in International Dollars", fontname="Arial", size=12)
axes.set_xlabel("Life expectancy at birth in years", fontname="Arial", size=12)
axes.set_ylabel(
# Set Gridlines and colours
=True , which = "both", axis = "x", color = "white") # Without this both grid lines are visible (bug?)
axes.grid(visible=True , which = "both", axis = "y", color = (0.745, 0.745, 0.745))
axes.grid(visibleFalse)
axes.set_frame_on(True); axes.set_axisbelow(
You can also manually set the tick labels. As an example, here Y axis ticks have been changed to be every 20 instead of every 5, while the X axis tick labels have been changed to be in thousands of International Dollars on line 17.
This technique will become more useful in different types of visualisation; but it is important to cover here briefly.
# Create variables for the ticks.
= np.arange(start=0, stop=(gm_1987["gdp_per_cap"].max() + 1), step=10000)
my_xticks = np.arange(start=0, stop=(gm_1987["life_exp"].max() + 1), step=20)
my_yticks
# Create our figure and our axes
= plt.subplots(figsize=(6, 5))
figure, axes
# Plot the Data
=gm_1987["gdp_per_cap"], y=gm_1987["life_exp"],
axes.scatter(x=(0 , 0.239, 0.349)) # Colour specified using RGB Values
color
# Set the x and y ticks
axes.set_xticks(my_xticks)
axes.set_yticks(my_yticks)
# NEW - Set the Y tick labels
/ 1000)
axes.set_xticklabels(my_xticks
# Add Labels and titles
# Add Labels and titles
"Graph showing Life Expectancy by GDP per Capita",
axes.set_title(="Arial", size=16)
fontname"Gross Domestic Product per Capita (International Dollars, thousands)", fontname="Arial", size=12)
axes.set_xlabel("Life expectancy at birth in years", fontname="Arial", size=12)
axes.set_ylabel(
# Set Gridlines and colours
=True , which = "both", axis = "x", color = "white") # Without this both grid lines are visible (bug?)
axes.grid(visible=True , which = "both", axis = "y", color = (0.745, 0.745, 0.745))
axes.grid(visibleFalse)
axes.set_frame_on(True); axes.set_axisbelow(
We can also change the colour of our ticks using the .set_tick_params()
methods for both the X and Y axes. This helps them to not look so stark against the grid line colours.
# Create variables for the ticks.
= np.arange(start=0, stop=(gm_1987["gdp_per_cap"].max() + 1), step=10000)
my_xticks = np.arange(start=0, stop=(gm_1987["life_exp"].max() + 1), step=20)
my_yticks
# Create our figure and our axes
= plt.subplots(figsize=(6, 5))
figure, axes
# Plot the Data
=gm_1987["gdp_per_cap"], y=gm_1987["life_exp"],
axes.scatter(x=(0 , 0.239, 0.349)) # Colour specified using RGB Values
color
# Set the x and y ticks
axes.set_xticks(my_xticks)
axes.set_yticks(my_yticks)
# Set the Y tick labels
/ 1000)
axes.set_xticklabels(my_xticks
# Add Labels and titles
"Graph showing Life Expectancy by GDP per Capita",
axes.set_title(="Arial", size=16)
fontname"Gross Domestic Product per Capita (International Dollars, thousands)", fontname="Arial", size=12)
axes.set_xlabel("Life expectancy at birth in years", fontname="Arial", size=12)
axes.set_ylabel(
# NEW Set Tick Colours to the same grey as our gridlines
=(0.745, 0.745, 0.745))
axes.xaxis.set_tick_params(color=(0.745, 0.745, 0.745))
axes.yaxis.set_tick_params(color
# Set Gridlines and colours
=True , which = "both", axis = "x", color = "white") # Without this both grid lines are visible (bug?)
axes.grid(visible=True , which = "both", axis = "y", color = (0.745, 0.745, 0.745))
axes.grid(visibleFalse)
axes.set_frame_on(True); axes.set_axisbelow(
4.4.3 Exercise 4
Using the visualisation you created in the previous exercise:
Set the Y axis to start from 0.
Set the colours of the tick labels to match the gridline colour.
# Create our figure and our axes
= plt.subplots(figsize=(6, 5))
figure, axes
# Plot the Data
=UK_gapminder["gdp_per_cap"], y=UK_gapminder["fertility"],
axes.scatter(x= "#A5CD46",
color=0.9)
alpha
# Add Labels and titles
"Graph showing Fertility by GDP per Capita", fontname="Arial", size=16)
axes.set_title("Gross Domestic Product per Capita in International Dollars", fontname="Arial", size=12)
axes.set_xlabel("Fertility (number of children per woman)", fontname="Arial", size=12)
axes.set_ylabel(
# Axes colour, line visibility
True, color=(0.745, 0.745, 0.745))
axes.yaxis.grid(False)
axes.set_frame_on(True);
axes.set_axisbelow(
# Set Y axis to start at 0.
=0)
axes.set_ylim(bottom
# NEW - Set Tick Colours to the same grey as our gridlines
=(0.745, 0.745, 0.745))
axes.xaxis.set_tick_params(color=(0.745, 0.745, 0.745)); axes.yaxis.set_tick_params(color
4.5 Titles and Subtitles
Previously we’ve used the axes.set_title()
method for setting our title.
If we want to have a large title and a subtitle we need to use the methods plt.title()
and plt.suptitle()
.
plt.suptitle()
is a Super Title. It is shown above the title. Although it may seem odd; this needs to contain the text we would ordinarily think of as the title.
plt.title()
is below the suptitle
, so can contain text we’d ordinarily think of as the subtitle.
# Create our figure and our axes
= plt.subplots(figsize=(6, 5))
figure, axes
# Plot the Data
=gm_1987["gdp_per_cap"], y=gm_1987["life_exp"],
axes.scatter(x=(0, 0.239, 0.349)) # Colour specified using RGB Values
color
# Add Labels and titles
"Gross Domestic Product per Capita (International Dollars)", fontname="Arial", size=12)
axes.set_xlabel("Life expectancy at birth in years", fontname="Arial", size=12)
axes.set_ylabel(
# Set the Y axis lower limit to 0
=0)
axes.set_ylim(bottom
# NEW - Set the Title (suptitle) and subtitle (title)
"Graph showing Life Expectancy by GDP per Capita",
plt.suptitle(="Arial", size=16)
fontname"Data from Gapminder Dataset" ,
plt.title(="Arial", size=12, loc="left")
fontname
# Set Tick Colours to the same grey as our gridlines
=(0.745, 0.745, 0.745))
axes.xaxis.set_tick_params(color=(0.745, 0.745, 0.745))
axes.yaxis.set_tick_params(color
# Set Gridlines and colours
=True , which = "both", axis = "x", color = "white") # Without this both grid lines are visible (bug?)
axes.grid(visible=True , which = "both", axis = "y", color = (0.745, 0.745, 0.745))
axes.grid(visibleFalse)
axes.set_frame_on(True); axes.set_axisbelow(
4.6 Footnotes and Captions
We can add footnotes and captions or annotations using the figure.text()
method. This takes the parameters:
x
andy
to control the location of the text. This appears to be more an art than a science; you may need to experiment with the numbers to find a location you prefer.s
for the string of text you wish to displayha
for the horizontal alignment of the text, either centre, left or right.
It also takes the text parameters we’ve discussed previously.
You can have multiple of these objects on the figure, for example for annotating lines or points.
If annotating a point on a diagram using axes.text()
can provide you with more accurate results!
The guidance does discourage annotations, unless necessary.
# Create our figure and our axes
= plt.subplots(figsize = (6,5))
figure, axes
# Plot the Data
=gm_1987["gdp_per_cap"], y=gm_1987["life_exp"],
axes.scatter(x=(0, 0.239, 0.349)) # Colour specified using RGB Values
color
# Add Labels and titles
"Gross Domestic Product per Capita in International Dollars", fontname="Arial", size=12)
axes.set_xlabel("Life expectancy at birth in years", fontname="Arial", size=12)
axes.set_ylabel(
# Set Tick Colours to the same grey as our gridlines
=(0.7451,0.7451,0.7451))
axes.xaxis.set_tick_params(color=(0.7451,0.7451,0.7451))
axes.yaxis.set_tick_params(color
# Set Gridlines and colours
=True , which = "both", axis = "x", color = "white") # Without this both grid lines are visible (bug?)
axes.grid(visible=True , which = "both", axis = "y", color = (0.745, 0.745, 0.745))
axes.grid(visibleFalse)
axes.set_frame_on(True)
axes.set_axisbelow(
# Set the Y axis lower limit to 0
=0)
axes.set_ylim(bottom
# Set the Title (suptitle) and subtitle (title)
"Graph showing Life Expectancy by GDP per Capita",
plt.suptitle(="Arial", size=16)
fontname
"Data from Gapminder Dataset" ,
plt.title(="Arial", size=12, loc="left")
fontname
# NEW - Set the caption
=0.65, y=-0.01, s="Source: Gapminder", ha="left",
figure.text(x="Arial", size=12)
fontname
# NEW - Set an annotation
= 31871.53030, y = 78.03, s="Switzerland", ha="left",
axes.text(x="Arial", size=12); fontname
4.6.1 Exercise 5
Using the visualisation you created in the previous exercise:
Modify your plot to set a title, subtitle and a caption.
# Create our figure and our axes
= plt.subplots(figsize=(6, 5))
figure, axes
# Plot the Data
=UK_gapminder["gdp_per_cap"], y=UK_gapminder["fertility"],
axes.scatter(x= "#A5CD46",
color=0.9)
alpha
# NEW - Add suptitle and title
"Graph showing Fertility by GDP per Capita",
plt.suptitle(="Arial", size=16)
fontname
"Data from Gapminder Dataset" ,
plt.title(="Arial", size=12, loc="left")
fontname
# Add Labels
"Gross Domestic Product per Capita in International Dollars", fontname="Arial", size=12)
axes.set_xlabel("Fertility (number of children per woman)", fontname="Arial", size=12)
axes.set_ylabel(
# NEW - Set the caption
=0.65, y=-0.01, s="Source: Gapminder", ha="left",
figure.text(x="Arial", size=12)
fontname
# Set Y axis to start at 0.
=0)
axes.set_ylim(bottom
# Set Tick Colours to the same grey as our gridlines
=(0.7451, 0.7451, 0.7451))
axes.xaxis.set_tick_params(color=(0.7451, 0.7451, 0.7451)); axes.yaxis.set_tick_params(color
In Matplotlib .annotate()
also exists – this can be useful as it also allows arrows to be drawn. However drawing arrows is not generally considered good practice.
In versions of Matplotlib < 3.3 the parameter for text is s =
, from 3.3 this is text =
. For compatibility we’ve omitted the parameter, but we recommend using the parameter that’s appropriate for your version.
For more information on this method, including more customisation options there is a link to the documentation
# Create our figure and our axes
= plt.subplots(figsize = (6,5))
figure, axes
# Plot the Data
=gm_1987["gdp_per_cap"], y=gm_1987["life_exp"],
axes.scatter(x=(0, 0.239, 0.349)) # Colour specified using RGB Values
color
# Add Labels and titles
"Gross Domestic Product per Capita in International Dollars", fontname="Arial", size=12)
axes.set_xlabel("Life expectancy at birth in years", fontname="Arial", size=12)
axes.set_ylabel(
# Set Tick Colours to the same grey as our gridlines
=(0.7451,0.7451,0.7451))
axes.xaxis.set_tick_params(color=(0.7451,0.7451,0.7451))
axes.yaxis.set_tick_params(color
# Set Gridlines and colours
=True , which = "both", axis = "x", color = "white") # Without this both grid lines are visible (bug?)
axes.grid(visible=True , which = "both", axis = "y", color = (0.745, 0.745, 0.745))
axes.grid(visibleFalse)
axes.set_frame_on(True)
axes.set_axisbelow(
# Set the Y axis lower limit to 0
=0)
axes.set_ylim(bottom
# Set the Title (suptitle) and subtitle (title)
"Graph showing Life Expectancy by GDP per Capita",
plt.suptitle(="Arial", size=16)
fontname
"Data from Gapminder Dataset" ,
plt.title(="Arial", size=12, loc="left")
fontname
# Set the caption
=0.65, y=-0.01, s="Source: Gapminder", ha="left",
figure.text(x="Arial", size=12)
fontname
# NEW - Set an annotation
# Set coordinates (these could be set manually rather than programatically here)
= val = gm_1987[gm_1987["country"] == "Switzerland"].values[0][5] # GDP is our 5th col
swiss_gdp_x = val = (gm_1987[gm_1987["country"] == "Switzerland"].values[0][3] - 0.5) # Life Exp is our 3rd
swiss_le_y
"Switzerland",
axes.annotate(= (swiss_gdp_x, swiss_le_y),
xy = (0, -100),
xytext= "offset points",
textcoords =dict(headwidth=10, width=1, color="black"),
arrowprops=12); fontsize
We can add vertical or horizontal lines to our charts using plt.axvline()
or plt.axhline()
Here we’re using plt.axvline()
to create a vertical line to show the mean GDP and plt.ahline()
to show the mean life expectancy.
Again, you often won’t find both on the same plot; this is for demonstration purposes.
I’ve used axes.annotate()
to annotate each line. Alternatively you can provide a label
argument to the plt.ahline()
and display a legend.
# Create our figure and our axes
= plt.subplots(figsize = (6,5))
figure, axes
# Plot the Data
=gm_1987["gdp_per_cap"], y=gm_1987["life_exp"],
axes.scatter(x=(0, 0.239, 0.349)) # Colour specified using RGB Values
color
# Add Labels and titles
"Gross Domestic Product per Capita in International Dollars", fontname="Arial", size=12)
axes.set_xlabel("Life expectancy at birth in years", fontname="Arial", size=12)
axes.set_ylabel(
# Set Tick Colours to the same grey as our gridlines
=(0.7451,0.7451,0.7451))
axes.xaxis.set_tick_params(color=(0.7451,0.7451,0.7451))
axes.yaxis.set_tick_params(color
# Set Gridlines and colours
=True , which = "both", axis = "x", color = "white") # Without this both grid lines are visible (bug?)
axes.grid(visible=True , which = "both", axis = "y", color = (0.745, 0.745, 0.745))
axes.grid(visibleFalse)
axes.set_frame_on(True)
axes.set_axisbelow(
# Set the Y axis lower limit to 0
=0)
axes.set_ylim(bottom
# Set the Title (suptitle) and subtitle (title)
"Graph showing Life Expectancy by GDP per Capita",
plt.suptitle(="Arial", size=16)
fontname
"Data from Gapminder Dataset" ,
plt.title(="Arial", size=12, loc="left")
fontname
# Set the caption
=0.65, y=-0.01, s="Source: Gapminder", ha="left",
figure.text(x="Arial", size=12)
fontname
# Vertical line to show mean gdp
= (gm_1987["gdp_per_cap"].mean()),
plt.axvline(x = 0,
ymin = gm_1987["gdp_per_cap"].max(),
ymax = "red", linestyle = "dashed",
color = "mean")
label
# Create a horizontal line
= (gm_1987["life_exp"].mean()),
plt.axhline(y = 0,
xmin = gm_1987["life_exp"].max(),
xmax = "red", linestyle = "dotted")
color
# Annotate the V line
"Mean GDP",
axes.annotate(= ((gm_1987["gdp_per_cap"].mean()), 22),
xy = (10, 0),
xytext= "offset points",
textcoords =12,
fontsize= "red")
color
# Annotate the H line
"Mean Life Expectancy",
axes.annotate(= ((gm_1987["life_exp"].mean()), 55),
xy = (200, 0),
xytext= "offset points",
textcoords =12,
fontsize= "red"); color
4.7 Legends
In the colour section we saw that we could use a different colour for each continent. We also used the code plt.legend()
to display a legend detailing what each colour value represented. We have some additional parameters we can specify.
frameon
Takes eitherTrue
orFalse
and affects if the legend has a border box. Default isTrue
.loc
Takes either a location code from 0 to 10 or a location string. Passing 2 or “upper left” will put the legend box in the upper left corner. Here I’ve gone for ‘center right’ just to demonstrate the difference – but I would argue the default is a good place for this visualisation. We recommend using the strings, as they’re easier to understand for people who read your code. Default behaviour is upper right.ncol
Alters the number of columns; default is 1. I’ve set it to 2 to have a shorter, but wider, legend.prop={"family": "Arial" , "size":12}
Prop allows us to set the default font parameters for the Legend box. Here I’m choosing Arial and 12 to match with the other aspects of the visualisation.title
Allows us to add a title for the legend.
# NEW - Set Up
= gm_1987["continent"].unique().tolist() # Create a list with each unique value in continent column
continents # Sort this list in alphabetical order
continents.sort() = ["#335C67" ,"#F5D000", "#E09F3E", "#9E2A2B", "#540B0E"] # Create a list of colours I want
my_palette
# zips together our continent names and pallete colours into one list of tuples
= zip(continents, my_palette)
continent_palette
# Create our figure and our axes
= plt.subplots(figsize=(6, 5))
figure, axes
# NEW - Loop through each continent and each colour in turn
for continent_name, colour in continent_palette:
# Get only the data for the continent we are looking at
= gm_1987[gm_1987["continent"] == continent_name]
continent_rows
=continent_rows["gdp_per_cap"],
axes.scatter(x=continent_rows["life_exp"],
y=colour, # Use corresponding colour from continent_palette
c=continent_name) # Gives each continent a label for our legend
label
# Add Labels, Title and Captions
"Graph showing Life Expectancy by GDP per Capita",
plt.suptitle(="Arial", size=16)
fontname"Data from Gapminder Dataset" ,
plt.title(="Arial", size=12, loc="left")
fontname
=0.65, y=-0.01, s="Source: Gapminder", ha="left",
figure.text(x="Arial", size=12)
fontname"Gross Domestic Product per Capita in International Dollars", fontname="Arial", size=12)
axes.set_xlabel("Life expectancy at birth in years", fontname="Arial", size=12)
axes.set_ylabel(
# Set Gridlines and colours
=True , which = "both", axis = "x", color = "white") # Without this both grid lines are visible (bug?)
axes.grid(visible=True , which = "both", axis = "y", color = (0.745, 0.745, 0.745))
axes.grid(visibleFalse)
axes.set_frame_on(True)
axes.set_axisbelow(
# Set the Y axis lower limit to 0
=0)
axes.set_ylim(bottom
# NEW - Customise and show the Legend
=False, # Turn off border around legend
plt.legend(frameon="center right", # Change location to center right
loc=2, # Change to 2 columns
ncol={"family": "Arial", "size":12},
prop= "Continent"); title
As we’ve said before it’s important to note that these changes will not work for all visualisations and are sometimes not good visualisations - but just demonstrations of what specific parameters do.
In the visualisation above splitting the points into two columns may work if the location wasn’t changed – but it’s too close to the data points. Removing the frame around the legend also could lead viewers to believe the key is data points!
We can also add the parameters bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0
to move our legend outside of the axes.
For many ONS plots with a small amount of variables you’ll see the legend at the top of the chart; underneath the titles and subtitles. This is done in a similar manner; and in a later chapter there is an example of this.
# Set Up
= gm_1987["continent"].unique().tolist() # Create a list with each unique value in continent column
continents # Sort this list in alphabetical order
continents.sort() = ["#335C67" ,"#F5D000", "#E09F3E", "#9E2A2B", "#540B0E"] # Create a list of colours I want
my_palette
# zips together our continent names and pallete colours into one list of tuples
= zip(continents, my_palette)
continent_palette
# Create our figure and our axes
= plt.subplots(figsize=(6, 5))
figure, axes
# Loop through each continent and each colour in turn
for continent_name, colour in continent_palette:
# Get only the data for the continent we are looking at
= gm_1987[gm_1987["continent"] == continent_name]
continent_rows
=continent_rows["gdp_per_cap"],
axes.scatter(x=continent_rows["life_exp"],
y=colour, # Use corresponding colour from continent_palette
c=continent_name) # Gives each continent a label for our legend
label
# Add Labels, Title and Captions
"Graph showing Life Expectancy by GDP per Capita",
plt.suptitle(="Arial", size=16)
fontname"Data from Gapminder Dataset" ,
plt.title(="Arial", size=12, loc="left")
fontname
=0.65, y=-0.01, s="Source: Gapminder", ha="left",
figure.text(x="Arial", size=12)
fontname"Gross Domestic Product per Capita in International Dollars", fontname="Arial", size=12)
axes.set_xlabel("Life expectancy at birth in years", fontname="Arial", size=12)
axes.set_ylabel(
# Set Gridlines and colours
=True , which = "both", axis = "x", color = "white") # Without this both grid lines are visible (bug?)
axes.grid(visible=True , which = "both", axis = "y", color = (0.745, 0.745, 0.745))
axes.grid(visibleFalse)
axes.set_frame_on(True)
axes.set_axisbelow(
# Set the Y axis lower limit to 0
=0)
axes.set_ylim(bottom
# NEW - Customise and show the Legend
=(1.05, 1), loc=2, borderaxespad=0,
plt.legend(bbox_to_anchor=False, # Turn off border around legend
frameon=2, # Change to 2 columns
ncol={"family": "Arial", "size":12},
prop= "Continent"); title
5 Globally Setting Paramaters
5.1 Globally setting fonts
We will often want to globally change the font for the whole notebook or script.
We can set these using rcParams
.
As we do with other global settings we should set these at the top of our workbook, when we load in our packages. This will ensure everything underneath uses the same global settings.
rcParams["font.family"] = "sans-serif"
This allows us to choose the font family we want to use. Following GSS Guidance, this is sans-serif.
rcParams["font.sans-serif"] = ["Arial" , "Tahoma"]
Here we are setting the specific fonts we wish to use in the order we wish to use them, in a list. If Arial is not available it will try to use Tahoma etc.
As these are fairly universal fonts, they should be available for most users. This is useful for creating reproducible code, ensuring the output looks the same regardless of who runs the code.
# NEW - import rcParams and set font settings
from matplotlib import rcParams
"font.family"] = "sans-serif"
rcParams["font.sans-serif"] = ["Arial", "Tahoma"]
rcParams[
# Create a visualisation to demonstrate the font changes
# A visualisation here
# Create our figure and our axes
= plt.subplots(figsize=(6, 5))
figure, axes
# Plot the Data
=gm_1987["gdp_per_cap"], y=gm_1987["life_exp"])
axes.scatter(x
# Add Labels and titles
# Add Labels, Title and Captions
"Graph showing Life Expectancy by GDP per Capita",
plt.suptitle(="Arial", size=16)
fontname"Data from Gapminder Dataset" ,
plt.title(="Arial", size=12, loc="left")
fontname
=0.65, y=-0.01, s="Source: Gapminder", ha="left",
figure.text(x="Arial", size=12)
fontname"Gross Domestic Product per Capita in International Dollars", fontname="Arial", size=12)
axes.set_xlabel("Life expectancy at birth in years", fontname="Arial", size=12)
axes.set_ylabel(
# Set Gridlines and colours
=True , which = "both", axis = "x", color = "white") # Without this both grid lines are visible (bug?)
axes.grid(visible=True , which = "both", axis = "y", color = (0.745, 0.745, 0.745))
axes.grid(visibleFalse)
axes.set_frame_on(True)
axes.set_axisbelow(
# Set the Y axis lower limit to 0
=0); axes.set_ylim(bottom
5.2 Globally Setting Font Sizes
We can also set the font sizes from matplotlib.rc
. We can fine tune these for the individual elements of our visualisation.
The minimum recommended font size recommended is 12.
# Set the default font sizes
import matplotlib
# setting variables for my sizes
# If I want to change a size, I only need to change it here
= 12
small_size = 14
medium_size = 16
bigger_size
# Change the settings
"font", size=small_size) # controls default text sizes
matplotlib.rc("axes", titlesize=small_size) # fontsize of the axes title
matplotlib.rc("axes", labelsize=medium_size) # fontsize of the x and y labels
matplotlib.rc("xtick", labelsize=small_size) # fontsize of the tick labels
matplotlib.rc("ytick", labelsize=small_size) # fontsize of the tick labels
matplotlib.rc("legend", fontsize=small_size) # legend fontsize
matplotlib.rc("axes", titlesize=bigger_size) # title fontsize matplotlib.rc(
Again, this may seem like a lot of additional code, but remember we can set these once and they apply globally.
Let’s create a basic visualisation again to see the effects. Notice we’re now not specifying any font sizes or families in our .set_title()
, .set_xlabel
or .set_ylabel()
.
# A visualisation here to check it's worked.
# Create our figure and our axes
= plt.subplots(figsize=(6, 5))
figure, axes
# Plot the Data
=gm_1987["gdp_per_cap"], y=gm_1987["life_exp"])
axes.scatter(x
# Add Labels, Title and Captions
"Graph showing Life Expectancy by GDP per Capita",
plt.suptitle(="Arial", size=16)
fontname"Data from Gapminder Dataset" , loc="left")
plt.title(
=0.65, y=-0.01, s="Source: Gapminder", ha="left")
figure.text(x"Gross Domestic Product per Capita in International Dollars")
axes.set_xlabel("Life expectancy at birth in years")
axes.set_ylabel(
# Set Gridlines and colours
=True , which = "both", axis = "x", color = "white") # Without this both grid lines are visible (bug?)
axes.grid(visible=True , which = "both", axis = "y", color = (0.745, 0.745, 0.745))
axes.grid(visibleFalse)
axes.set_frame_on(True)
axes.set_axisbelow(
# Set the Y axis lower limit to 0
=0); axes.set_ylim(bottom
We can also set the global colour cycler if we want; here I’ve set it to the Paired.colors
variant.
Generally the default value is okay; so future chapters will not be setting this globally.
"axes.prop_cycle"] = plt.cycler("color", plt.cm.Paired.colors) plt.rcParams[
5.3 Global Settings in one cell.
As mentioned previously setting options globally can save a large amount of time. This is especially true if you’re creating multiple visualisations within a notebook or script.
In this section we’ll combine both setting the default fonts and setting the font sizes.
We won’t introduce anything new here – just streamline the code block we would use at the start of our notebook.
Note for completion I’ve loaded in Pandas here and also loaded in the data again.
# Load our packages
import pandas as pd
# Import Matplotlib packages and functions:
import matplotlib
import matplotlib.pyplot as plt
from matplotlib import rcParams
# Set Default Fonts
"font.family"] = "sans-serif"
rcParams["font.sans-serif"] = ["Arial", "Tahoma"]
rcParams[
# Set Default font sizes
= 12
small_size = 14
medium_size = 16
bigger_size
# Change the font size for individual elements
"font", size=small_size) # controls default text sizes
matplotlib.rc("axes", titlesize=small_size) # fontsize of the axes title
matplotlib.rc("axes", labelsize=medium_size) # fontsize of the x and y labels
matplotlib.rc("xtick", labelsize=small_size) # fontsize of the tick labels
matplotlib.rc("ytick", labelsize=small_size) # fontsize of the tick labels
matplotlib.rc("legend", fontsize=small_size) # legend fontsize
matplotlib.rc("axes", titlesize=bigger_size) # title fontsize
matplotlib.rc(
# Display visualisations below cells
%matplotlib inline
# Load in Data
= pd.read_csv("../data/gapminder.csv") gapminder
6 Saving Visualisations
To save a visualisation in Matplotlib you use the code:
"../outputs/my_vis.png") figure.savefig(
Here:
figure
refers to the last figure I’ve created.
.savefig()
is our method.
The only argument here I’ve specified is the file path and extension of my new output.
My current working directory is notebooks
so I need to go back up a level (../
) to move into my outputs folder. Maintaining a separate outputs folder rather than just having generic “files” location helps to keep projects organised. You can read more about working directories and project structure in the Intro to Python course.
# Note by changing the file extension I'm now saving as a PDF.
"../outputs/my_vis.pdf") figure.savefig(
Matplotlib can save as a variety of file formats:
- Image (Raster) Formats
- .png - ‘portable network graphics’
- .jpg /.jpeg - ‘joint photographic experts group’
- .tif /.tiff - ‘tagged image file format’
- Vector Formats
- .ps /.eps - Postscript/ Encapsulated Postscript
- .pdf - ‘portable document format’
- .svg - ‘scalable vector graphics’
Why may we care about file types?
Please note - any pixelisation in the vector image is the result of the compression when editing, or using, Jupyter Notebooks
Raster images are images we’re often most familiar with. These are made up of tiny squares; called pixels. Remembering back to the early days of digital cameras pictures often had that “squares” like quality. If we enlarge these images we can often find they look blurry; or pixilated. However they are often smaller file sizes than vector images.
Software for editing Raster images includes Microsoft Paint (a classic!), Adobe Photoshop and freeware options like Gimp and Paint.net.
Vector images are instead made up of paths. This gives them the ability to be scaled up or down at will without losing any quality – or appearing pixelated. Cliparts are a classic example of Vector images. A downside is that they’re often much larger file sizes and your audience may not be able to view the files if sent individually (rather than embedded in a Notebook or markdown document). Editing these is often more complicated; and requires special software. Note – We’ve included PDF files, as Matplotlib creates them as vectors; but not all generic PDF files are automatically vectors.
Software for editing Vector images includes Adobe Illustrator and freeware options like Inkscape.
We can set other arguments in the .savefig()
method:
# Setting additonal parameters
"../outputs/my_vis.png", dpi=300, bbox_inches="tight") figure.savefig(
Here I’m using the additional parameters:
dpi
to set the dots per inch of the output. Most publications ask for a minimum of 300. This only applies to raster images (our .jpegs .pngs etc), but if we apply it to a vector image it will just be ignored.bbox_inches
relates to the whitespace around a figure. Setting this to “tight” means we’ll have a minimal border around our figure.
There are other settings; but these ones should cover most of your needs.
6.0.1 Exercise 3
Using the visualisation you created in the previous exercise:
Save it in the outputs folder as a .jpeg file at 300 dpi, set the white space to “tight”.
Save it as a PDF file in the outputs folder, again setting the white space to “tight” and compare the two.
# Create the figure again - so we're saving the right thing.
= plt.subplots(figsize=(6, 5))
figure, axes
# Plot the Data
=UK_gapminder["gdp_per_cap"], y=UK_gapminder["fertility"],
axes.scatter(x= "#A5CD46",
color=0.9)
alpha
# Add Labels and titles
"Graph showing Fertility by GDP per Capita",
plt.suptitle(="Arial", size=16)
fontname
"Data from Gapminder Dataset" ,
plt.title(="Arial", size=12, loc="left")
fontname
# Add Labels
"Gross Domestic Product per Capita in International Dollars", fontname="Arial", size=12)
axes.set_xlabel("Fertility (number of children per woman)", fontname="Arial", size=12)
axes.set_ylabel(
#Set the caption
=0.65, y=-0.01, s="Source: Gapminder", ha="left",
figure.text(x="Arial", size=12)
fontname
# Set Y axis to start at 0.
=0)
axes.set_ylim(bottom
# Set Tick Colours to the same grey as our gridlines
=(0.7451, 0.7451, 0.7451))
axes.xaxis.set_tick_params(color=(0.7451, 0.7451, 0.7451));
axes.yaxis.set_tick_params(color
# NEW - Save the JPEG at 300 DPI
"../outputs/exercise_jpeg.jpeg", dpi=300, bbox_inches="tight")
figure.savefig(
# NEW - Save the PDF
"../outputs/exercise_pdf.pdf",bbox_inches="tight") figure.savefig(
7 Other Ways to Visualise
We’ve discussed plotting only in terms of Matplotlib in this chapter; this is because it forms the foundation of other packages (like Seaborn) and can really help to understand the structure of how data visualisation works.
Even if you exclusively use Seaborn in the future you may need to take elements from Matplotlib for fine tuning things like axes, or legends.
While most plots can be created in Matplotlib; sometimes the code needed to create them is much more complicated than other packages. This can be incredibly useful when we want very fine control, it can sometimes be too much if we have a simpler problem.
For example, our earlier visualisation for Life Expectancy by GDP Per Capita against infant mortality; coloured by continent took a lot of code, and a for loop to produce this output:
Using Seaborn we can get that down, and no loops!
import seaborn as sns # We'd usually import this at the start
"whitegrid") # The style applies to all cells run after this - this one looks most similar to Matplotlib.
sns.set_style(
= sns.lmplot(x = "gdp_per_cap", y = "life_exp", data = gm_1987,
plot = False, # lmplot automatically fits a line of best fit; this removes it
fit_reg = "continent", # We can colour by a column of data
hue = sns.color_palette("bright")) # Set the colour pallete to match the Matplotlib one
palette
0, None) # Set our Y axis to start from 0 or it "floats"
plt.ylim(
# Removes the "spines" or edges of our vis
=True, bottom=True)
sns.despine(left
# Return the axes object and set the gridlines with it
= plot.ax
axes =True , which = "both", axis = "x", color = "white") # Without this both grid lines are visible (bug?)
axes.grid(visible=True , which = "both", axis = "y", color = (0.745, 0.745, 0.745))
axes.grid(visible
# Add Labels, Title and Captions (this comes from Matplotlib!)
"Graph showing Life Expectancy by GDP per Capita",
plt.suptitle(="Arial", size=16, y = 1.05)
fontname"Data from Gapminder Dataset" ,
plt.title(="Arial", size=12, loc="left", y = 1.01)
fontname=20000.0, y=-14, s="Source: Gapminder", ha="left",
plt.text(x="Arial", size=12)
fontname
# Note the methods are set_xlabels -with an s!
"Gross Domestic Product per Capita in International Dollars", fontname="Arial", size=12)
plot.set_xlabels("Life expectancy at birth in years", fontname="Arial", size=12); plot.set_ylabels(
As you can see Seaborn takes a different format than we’ve been using in matplotlib.
Here we’re using sns.lmplot
to create our scatter plot. sns.scatterplot
exists – but only from version 0.9.0 of Seaborn onwards.
The basic structure is:
x
- a string that is the name of the column we want for the X axis valuesy
- a string that is the name of the column we want for the Y axis valuesdata
- the DataFrame we want to plot.
Note that we’re using the titles and labels from Matplotlib.
As we mentioned earlier Seaborn is built on TOP of Matplotlib. Rather than write it’s own versions of creating labels and titles Seaborn uses them from Matplotlib. This makes sense; as the methods in Matplotlib for this are robust, and it reduces the amount of code in the Seaborn package.
= sns.lmplot(x = "gdp_per_cap", y = "life_exp", data = gm_1987)
plot
0, None) # Set our Y axis to start from 0 or it "floats"
plt.ylim(
# Removes the "spines" or edges of our vis
=True, bottom=True)
sns.despine(left
# Return the axes object and set the gridlines with it
= plot.ax
axes =True , which = "both", axis = "x", color = "white") # Without this both grid lines are visible (bug?)
axes.grid(visible=True , which = "both", axis = "y", color = (0.745, 0.745, 0.745))
axes.grid(visible
# Add Labels, Title and Captions (this comes from Matplotlib!)
"Graph showing Life Expectancy by GDP per Capita",
plt.suptitle(="Arial", size=16, y = 1.05, x = 0.6)
fontname"Data from Gapminder Dataset" ,
plt.title(="Arial", size=12, loc="left", y = 1.01, x = -0.03)
fontname
=20000.0, y=-14, s="Source: Gapminder", ha="left",
plt.text(x="Arial", size=12)
fontname
# Note the methods are set_xlabels -with an s!
"Gross Domestic Product per Capita in International Dollars", fontname="Arial", size=12)
plot.set_xlabels("Life expectancy at birth in years", fontname="Arial", size=12); plot.set_ylabels(
As mentioned in the comments this automatically applies a line of best fit. We can remove this with the parameter
= False fit_reg
= sns.lmplot(x = "gdp_per_cap", y = "life_exp", data = gm_1987,
plot = False)
fit_reg
0, None) # Set our Y axis to start from 0 or it "floats"
plt.ylim(
# Removes the "spines" or edges of our vis
=True, bottom=True)
sns.despine(left
# Return the axes object and set the gridlines with it
= plot.ax
axes =True , which = "both", axis = "x", color = "white") # Without this both grid lines are visible (bug?)
axes.grid(visible=True , which = "both", axis = "y", color = (0.745, 0.745, 0.745))
axes.grid(visible
# Add Labels, Title and Captions (this comes from Matplotlib!)
"Graph showing Life Expectancy by GDP Per Capita",
plt.suptitle(="Arial", size=16, y = 1.05, x = 0.6)
fontname"Data from Gapminder Dataset" ,
plt.title(="Arial", size=12, loc="left", y = 1.01, x = -0.03)
fontname
=20000.0, y=-14, s="Source: Gapminder", ha="left",
plt.text(x="Arial", size=12)
fontname
# Note the methods are set_xlabels -with an s!
"Gross Domestic Product Per Capita \n (International Dollars)", fontname="Arial", size=12)
plot.set_xlabels("Life expectancy at birth in years", fontname="Arial", size=12); plot.set_ylabels(
Colouring by continent is much easier – we can specify a column for the hue
parameter.
palette
controls the colour palette we’re applying. More information can be found in the Seaborn colour palette documentation
Pandas also comes with its own inbuilt plotting system .plot()
.
We can specify what kind of plot we want by using the kind
parameter.
The title
parameter allows us to set the title; and our x and y labels are automatically created here.
# Notice as well that Seaborn puts its "style" on all charts after it's loaded in our version
# After version 8.0 it doesn't do this.
= gm_1987.plot(x = "gdp_per_cap", y = "life_exp",
plot = "scatter",
kind = "Graph showing Life Expectancy by GDP per Capita",
title = 0)
ylim
# Return the axes object and set the gridlines with it
=True , which = "both", axis = "x", color = "white") # Without this both grid lines are visible (bug?)
plot.grid(visible=True , which = "both", axis = "y", color = (0.745, 0.745, 0.745))
plot.grid(visibleFalse); plot.set_frame_on(
We’ve just seen three different ways to produce similar plots.
None of them are “better” than other methods; but some are more succinct.
Some things are easier to implement in certain packages than others.
Using the Seaborn .lmplot()
allowed us to colour by the continent
column much more easily than the Matplotlib version. It’s also possible to do this in the Pandas .plot()
version; but again much more of an involved process.
For the rest of this course we will use a variety of methods from Matplotlib, Seaborn and Pandas. We’ll switch between these methods, depending on what creates the “better” plot.
8 Multiple Plots
8.1 Matplotlib
Earlier we saw that we can loop through to plot multiple values. Another way of doing this is by adding more axes.scatter()
layers.
I want to compare Africa and Europe.
Here I’ve filtered when setting my x
and y
co-ordinates – you may prefer to create a new DataFrame for each continent.
# Create our figure and our axes
= plt.subplots(figsize=(6, 5))
figure, axes
# Plot the Data for Africa
= (gm_1987[(gm_1987["continent"] == "Africa")]["gdp_per_cap"]),
axes.scatter(x = (gm_1987[(gm_1987["continent"] == "Africa")]["life_exp"]),
y = "#003D59",
color = "Africa")
label
# Plot the Data for Europe
= (gm_1987[(gm_1987["continent"] == "Europe")]["gdp_per_cap"]),
axes.scatter(x = (gm_1987[(gm_1987["continent"] == "Europe")]["life_exp"]),
y = "#A8BD3A",
color = "Europe")
label
# Add Labels, Title and Captions
"Graph showing Life Expectancy by GDP per Capita",
plt.suptitle(="Arial", size=16, y = 1.01)
fontname"For Africa and Europe - 1987" ,
plt.title(="Arial", size=12, loc="left", y = 1.05)
fontname
=0.65, y=-0.01, s="Source: Gapminder", ha="left",
figure.text(x="Arial", size=12)
fontname
"Gross Domestic Product per Capita in International Dollars", fontname="Arial", size=12)
axes.set_xlabel("Life expectancy at birth in years", fontname="Arial", size=12)
axes.set_ylabel(
# Set Gridlines and colours
=True , which = "both", axis = "x", color = "white") # Without this both grid lines are visible (bug?)
axes.grid(visible=True , which = "both", axis = "y", color = (0.745, 0.745, 0.745))
axes.grid(visibleFalse)
axes.set_frame_on(True)
axes.set_axisbelow(
# Set the Y axis lower limit to 0
=0)
axes.set_ylim(bottom
# Customise and show the Legend
=(0., 1.0, 1.0, .102), loc="lower left",
axes.legend(bbox_to_anchor=2, mode="expand", borderaxespad=0,
ncol={"family": "Arial", "size":12}); prop
8.2 Seaborn
For Seaborn as we saw earlier we can set the “hue” parameter. Note here as I’ve had to create a pared down DataFrame with just the two continents I wish to see.
# Create a pared down DataFrame
= gm_1987[(gm_1987["continent"] == "Africa") | (gm_1987["continent"] == "Europe") ]
gm_1987_africa_europe
# Create the Plot
= sns.lmplot(x = "gdp_per_cap", y = "life_exp", data = gm_1987_africa_europe,
plot = False, # lmplot automatically fits a line of best fit; this removes it
fit_reg = "continent", # We can colour by a column of data
hue = sns.color_palette("bright")) # Set the colour pallete to match the Matplotlib one
palette
0, None) # Set our Y axis to start from 0 or it "floats"
plt.ylim(
# Removes the "spines" or edges of our vis
=True, bottom=True)
sns.despine(left
# Return the axes object and set the gridlines with it
= plot.ax
axes =True , which = "both", axis = "x", color = "white") # Without this both grid lines are visible (bug?)
axes.grid(visible=True , which = "both", axis = "y", color = (0.745, 0.745, 0.745))
axes.grid(visible
# Add Labels, Title and Captions (this comes from Matplotlib!)
"Graph showing Life Expectancy by GDP per Capita",
plt.suptitle(="Arial", size=16, y = 1.05)
fontname"For Africa and Europe - 1987" ,
plt.title(="Arial", size=12, loc="left", y = 1.01)
fontname
= 20000.0, y=-14, s="Source: Gapminder", ha="left",
plt.text(x="Arial", size=12)
fontname"Gross Domestic Product per Capita in International Dollars", fontname="Arial", size=12)
axes.set_xlabel("Life expectancy at birth in years", fontname="Arial", size=12); axes.set_ylabel(
8.3 Multiple Axes - Matplotlib
For simple cases in Matplotlib multiple axes can be set when the figure is created. Here two axes objects are created - axes1
and axes2
. In plt.subplots()
how they should be arranged can be specified; here as 2 columns so side by side.
When we plot to each axes we use its name – e.g axes1.scatter()
. Here the name of each axes has been plotted using the .text()
method to illustrate this. There will later be an example with data.
# Create 1 figure, 2 axes and have them side by side
= plt.subplots(ncols = 2)
figure, (axes1, axes2)
= 0.5, y = 0.5, s ="axes1", fontname="Arial", size=20, color = "black")
axes1.text(x
= 0.5, y = 0.5, s ="axes2", fontname="Arial", size=20, color = "black"); axes2.text(x
And here 4 axes objects are created. This time to have 2 columns and 2 rows the axes need to be specified by a list, itself containing a list of the axes objects I want in each row.
Again, the text is shown for each axes so you can see how matplotlib allocated the contents of that list.
# Create 1 figure, 4 axes and have them in 2 columns and 2 rows
= plt.subplots(ncols = 2, nrows= 2)
figure, ([[axes1, axes2], [axes3, axes4]])
= 0.5, y = 0.5, s ="axes1", fontname="Arial", size=20, color = "black")
axes1.text(x
= 0.5, y = 0.5, s ="axes2", fontname="Arial", size=20, color = "black")
axes2.text(x
= 0.5, y = 0.5, s ="axes3", fontname="Arial", size=20, color = "black")
axes3.text(x
= 0.5, y = 0.5, s ="axes4", fontname="Arial", size=20, color = "black"); axes4.text(x
Finally a full example of how this can be applied to our Africa and Europe Data from 1987
# Set up the figure and two axes
= plt.subplots(ncols = 2)
figure, (axes1, axes2)
######################################################
# Plot Axes 1
= (gm_1987[(gm_1987["continent"] == "Africa")]["gdp_per_cap"]),
axes1.scatter(x = (gm_1987[(gm_1987["continent"] == "Africa")]["life_exp"]),
y = "#003D59")
color
"Africa",
axes1.set_title(="Arial", size=16, loc = "left")
fontname"Gross Domestic Product per Capita \n in International Dollars", fontname="Arial", size=12)
axes1.set_xlabel("Life expectancy at birth in years", fontname="Arial", size=12)
axes1.set_ylabel(
# Set Gridlines and colours
=True , which = "both", axis = "x", color = "white") # Without this both grid lines are visible (bug?)
axes1.grid(visible=True , which = "both", axis = "y", color = (0.745, 0.745, 0.745))
axes1.grid(visibleFalse)
axes1.set_frame_on(True)
axes1.set_axisbelow(
# Set the Y axis lower limit to 0
=0)
axes1.set_ylim(bottom
######################################################
# Plot Axes 2
= (gm_1987[(gm_1987["continent"] == "Europe")]["gdp_per_cap"]),
axes2.scatter(x = (gm_1987[(gm_1987["continent"] == "Europe")]["life_exp"]),
y = "#A8BD3A",
color = "Europe")
label
"Europe",
axes2.set_title(="Arial", size=16, loc = "left")
fontname"Gross Domestic Product per Capita \n in International Dollars", fontname="Arial", size=12)
axes2.set_xlabel("Life expectancy at birth in years", fontname="Arial", size=12)
axes2.set_ylabel(
# Set Gridlines and colours
=True , which = "both", axis = "x", color = "white") # Without this both grid lines are visible (bug?)
axes2.grid(visible=True , which = "both", axis = "y", color = (0.745, 0.745, 0.745))
axes2.grid(visibleFalse)
axes2.set_frame_on(True)
axes2.set_axisbelow(
# Set the Y axis lower limit to 0
=0)
axes2.set_ylim(bottom
######################################################
# Add Labels, Title and Captions
"Graph showing Life Expectancy by GDP per Capita",
plt.suptitle(="Arial", size=16, y = 1.05, x = 0.315)
fontname
= 0, y= 0.98, s="For Africa and Europe - 1987", ha="left",
figure.text(x="Arial", size=12)
fontname
=0.80, y=-0.01, s="Source: Gapminder", ha="left",
figure.text(x="Arial", size=12)
fontname
; figure.tight_layout()
8.4 Multiple Axes Programmatically – Matplotlib.
This can also do this programmatically with a loop in Matplotlib.
I’ve manually set the number of columns I wanted (2) and used the math.ceil() function to find the right number of rows. I could set this manually here as there’s only 5 individual graphs but I’m showing it as an option. .round()
always rounds down, so we need maths.ceil()
to round up.
This time we’re including another argument in our zip()
function – axes.flatten()
which allows us to loop through each axes in turn.
This is again referenced in our for loop – where we loop through each thing in our continent_palette
in turn.
When we plot now we’re using the placeholder each_ax
for each of our axes in turn. Note that I’m also setting the x and y lims to be the same for each plot. Without this we lose the sense of scale between the different continents.
We can’t set an uneven number of axes – in this example we have to plot 6 sets of axis; even though we only have 5 plots we wish to make. We can remove the blank plots with figure.delaxes(axes[2][1])
- this is in the order of columns and then rows; and starts counting at 0.
# Import the math module so we can get the .ceil for rounding
import math
# Set Up
= gm_1987["continent"].unique().tolist() # Create a list with each unique value in continent column
continents # Sort this list in alphabetical order
continents.sort() = ["#335C67" ,"#F5D000", "#E09F3E", "#9E2A2B", "#540B0E"] # Create a list of colours I want
my_palette
# NEW - Create our figure and our axes
= plt.subplots(nrows=(math.ceil(len(continents)/2)), ncols=2, figsize=(15, 15))
figure, axes =0.5)
figure.subplots_adjust(hspace
# zips together our axes, continent names and pallete colours into one list of tuples
= zip(axes.flatten(), continents, my_palette)
continent_palette
# Loop through each continent and each colour in turn
for each_ax, continent_name, colour in continent_palette:
# Get only the data for the continent we are looking at
= gm_1987[gm_1987["continent"] == continent_name]
continent_rows
=continent_rows["gdp_per_cap"],
each_ax.scatter(x=continent_rows["life_exp"],
y=colour )# Use corresponding colour from continent_palette
c
= continent_name, loc = "left")
each_ax.set_title( label "Gross Domestic Product per Capita in International Dollars", fontname="Arial", size=12)
each_ax.set_xlabel("Life expectancy at birth in years", fontname="Arial", size=12)
each_ax.set_ylabel(=True , which = "both", axis = "x", color = "white") # Without this both grid lines are visible (bug?)
each_ax.grid(visible=True , which = "both", axis = "y", color = (0.745, 0.745, 0.745))
each_ax.grid(visibleFalse)
each_ax.set_frame_on(True)
each_ax.set_axisbelow(
# Set the x and y lims to be the same for each vis - or we loose scale.
= 0, right = ( gm_1987["gdp_per_cap"].max() + (gm_1987["gdp_per_cap"].max() / 10) ) )
each_ax.set_xlim(left =0, top = ( gm_1987["life_exp"].max() + (gm_1987["life_exp"].max() / 10) ))
each_ax.set_ylim(bottom
# Add Labels, Title and Captions
"Graph showing Life Expectancy by GDP per Capita",
plt.suptitle(="Arial", size=16, y = 0.93, x = 0.25)
fontname=0.08, y= 0.9, s="Data from Gapminder Dataset", ha="left",
figure.text(x="Arial", size=14)
fontname=0.8, y= 0.1, s="Source: Gapminder", ha="left",
figure.text(x="Arial", size=12)
fontname
# Remove the empty plot at the bottom right
2][1]); # Cols and Rows - starts at 0 figure.delaxes(axes[
8.5 Multiple Axes – Seaborn.
We can do a similar effect in Seaborn. If you’re using a more modern version the sns.scatterplot()
method has the ability to plot to axes – so can be created similarly to the matplotlib version with finer control.
Here we set the col
parameter to continent, which automatically splits our data into different visualisations.
To show the X and Y axis on each plot we have to explicitly set sharex
and sharey
to False
.
To set the titles for each visualisation we have to first set the titles to be blank. If you comment out that line (11) it shows both our modified column names from the line below and the standard ones that Seaborn generates. It should also be noted the ”{col_name”}”
in set_titles is generic; and should be used exactly in that way to work. These methods are chained after the creation of our dataset.
We also need to assign the DataFrame here to a variable. We can then access the .axes
objects from them (line 22) and loop over them to set the labels, limits, grids and tick colours.
plot.fig.subplots_adjust
allows us to adjust the white space between each figure. Especially important as our y axis label here is quite long!
# Create the Plot
= (sns.lmplot(x = "gdp_per_cap", y = "life_exp", data = gm_1987,
plot = False, # lmplot automatically fits a line of best fit; this removes it
fit_reg = "continent", # Create a new chart for each continent
col = 2, # 2 charts per row
col_wrap = dict(sharex=False, sharey=False),
facet_kws ={"color": "blue", "alpha":1}) # Set the colours of the points and the alpha.
scatter_kws"") # Without this we also end up with a central title!
.set_titles("{col_name}", loc = "left" ) ) # Without this the titles are "continent = Asia" etc
.set_titles(
# Set global title, subtitle and caption.
"Graph showing Life Expectancy by GDP per Capita",
plt.suptitle(="Arial", size=16, y = 1.05, x = 0.31)
fontname= -2500, y = 360, s = "For each continent - 1987", fontname="Arial", size=14)
plt.text(x
# When I use "title" here it removes the "Oceania" label
= 69000.0, y=-18, s="Source: Gapminder", ha="left",
plt.text(x="Arial", size=12)
fontname
# Get each axes object in the plot
= plot.axes
axes
# Loop over them to set x label, y label, x and y limits, gridlines and tick colours.
for ax in axes:
"Life expectancy at birth in years")
ax.set_xlabel("Gross Domestic Product per Capita \n (International Dollars) ")
ax.set_ylabel(set(xlim=(0, (gm_1987["gdp_per_cap"].max() + (gm_1987["gdp_per_cap"].max()/10) )), # Sets xlim
ax.=(0, (gm_1987["life_exp"].max() + (gm_1987["life_exp"].max()/10))))
ylim=True , which = "both", color = (0.745, 0.745, 0.745))
ax.grid(visible=(0.7451,0.7451,0.7451))
ax.xaxis.set_tick_params(color=(0.7451,0.7451,0.7451))
ax.yaxis.set_tick_params(color
# Removes the "spines" or edges of our vis
=True, bottom=True)
sns.despine(left
# Adjust the white space between each figure
= 0.4, hspace = 0.5)
plot.fig.subplots_adjust(wspace
# Set gridlines
for index, each_ax in enumerate(plot.axes.flatten()):
=True , which = "both", axis = "x", color = "white") # Without this both grid lines are visible (bug?)
each_ax.grid(visible=True , which = "both", axis = "y", color = (0.745, 0.745, 0.745)); each_ax.grid(visible
9 End of Chapter
You have completed chapter 2 of the Data Visualisation course. Please move on to chapter 3.
2 Comments and Structure
A polite reminder that visualisation code can get lengthy quickly.
Many of these visualisations will only have one or two new concepts, the other code will be things covered previously.
To make the code clearer we will be using lots of comments:
As we get into more complicated visualisations new concepts will have the word NEW - at the start of the comment e.g:
We will also refer to line numbers to describe content.
You can turn on line numbers in the View Menu -> Toggle Line Numbers