Chapter 7 - Additional case study

Analysis Function and DSC Logo


logos

1 Data Visualisation in Python

1.1 Chapter 6 – Case Studies


Follow along with the code by running cells as you encounter them

Chapter Overview

  1. Packages and Data
  2. Case Study A
  3. Case Study B
  4. Case Study C
  5. Case Study D

For this section it’s important to note we’ll be trying to reproduce where possible the visualisations shown. Visualisations across government are made using a variety of techniques, not just R and Python. Some images are further enhanced through image editing tools.

Because of this we won’t always be match perfectly the images shown; although we will try our best!

There may be additional things you’ve not yet learned in the course! A lot of these extra bits are small bits that make the visualisations fully match the guidelines. We’ll always comment our code, but it’s a good oportunity to search and try and find your own soloutions.


# 1. Packages and Data

Let’s start, as always by loading required packages and data.

Packages that may be needed:

  • Numpy – Version 1.12.1
  • Pandas – Version 0.20.1
  • Matplotlib – Version 2.0.2 (here as the pyplot module)
  • Seaborn - Version 0.7.1

Always follow the standard convention for nicknames.

 # Load packages
import pandas as pd
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
from matplotlib import rcParams 
import seaborn as sns

#import toyplot #optional

from matplotlib.patches import Patch  # Used for some tweaking in Case Study B
from matplotlib.lines import Line2D   # Used for some tweaking in Case Study B

from matplotlib.font_manager import FontProperties # Used for the table in Case Study D

The data has been prepared in advance so for each these case studies just the visualisation needs to be performed. Where possible links to the raw datasets are included to allow for practice cleaning data if desired.

  • Case Study A uses fertility_rates.csv
  • Case Study B uses fraud_data.csv
  • Case Study C uses fires.csv
  • Case Study D uses fly_tipping.csv
# Case Study A
fertility = pd.read_csv("../data/fertility_rates.csv")

# Case Study B
fraud = pd.read_csv("../data/fraud_data.csv")

# Case Study C
fires = pd.read_csv("../data/fires.csv")

# Case Study D
flytipping = pd.read_csv("../data/fly_tipping.csv")

The magic command:

%matplotlib inline

means any plot created will be automatically embedded below the code cell once the code has been executed.

%matplotlib inline

Some default values for plot elements as in previous chapters can be set:

sns.set_style("whitegrid")

# Set Default Fonts

rcParams["font.family"] = "sans-serif"
rcParams["font.sans-serif"] = ["Arial", "Tahoma"]

# Set Default font sizes

small_size = 12
medium_size = 14
bigger_size = 16

# Change the font size for individual elements

matplotlib.rc("font", size=small_size)          # controls default text sizes
matplotlib.rc("axes", titlesize=small_size)     # fontsize of the axes title
matplotlib.rc("axes", labelsize=medium_size)    # fontsize of the x and y labels
matplotlib.rc("xtick", labelsize=small_size)    # fontsize of the tick labels
matplotlib.rc("ytick", labelsize=small_size)    # fontsize of the tick labels
matplotlib.rc("legend", fontsize=small_size)    # legend fontsize
matplotlib.rc("axes", titlesize=medium_size)    # title fontsize


rcParams["figure.dpi"]= 300 # Set the DPI for outputs to 300 for our tables

# 2. Case Study A

Replicate the graph below from the Births in England and Wales: 2019, publication, this is figure 2.

logos

Use the fertility dataset and replicate the graph following ONS/GSS guidelines.

For the colours of the lines use the following hex codes:

“Under 20” =“#206095”

“20 to 24” = “#118c7b”

“25 to 29” = “#003c57”

“30 to 34” = “#a8bd3a”

“35 to 39” = “#27a0cc”

“40 and over” = “#b26c96”

The ticks and “Live Births…” text are colour #6D6D6D

# Exercise
# Solution - These cells contain answers for the exercises.
#Run once to reveal the code.
#Run again to reveal the output. 

%load ../solutions/case_studies/case_study_a_fertility.py

# 3. Case Study B

Replicate the graph below from the Crime in England and Wales: year ending March 2020, publication, this is figure 13.

logos

Use the fertility dataset and plot the graph following ONS/GSS guidelines.

For the colours of the bars use the following hex codes:

  • “Year ending March 2019” = “#206095”

  • “Year ending March 2020” = “#118c7b”

# Exercise
# Solution - These cells contain answers for the exercises.
#Run once to reveal the code.
#Run again to reveal the output. 

%load ../solutions/case_studies/case_study_b_fraud.py

# 4. Case Study C

Replicate the graph below from the Detailed analysis of non-fire incidents attended by fire & rescue services, England, April 2018 to March 2019 (PDF), publication, this is chart 1.

logos

Use the fertility dataset and replicate the graph following GSS guidelines.

For the colours of the lines use the following hex codes:

  • “Fires” = “#8F23B3”
  • “Fire False Alarms” = “#BC7BD1”
  • “Non-Fire Incidents” =“#E8D1EF”

The title is also “#8F23B3” and other text is black.

We’ve rounded the percentages to two decmimal places here - they appear to have been manually added to the image as two are rounded up and one rounded down.

N.B If you download the raw data the numbers for 2018/19 is different to the visualisation. We have manually altered this in the dataset to match the visualisation.

# Exercise
# Solution - These cells contain answers for the exercises.
#Run once to reveal the code.
#Run again to reveal the output. 

%load ../solutions/case_studies/case_study_c_fire.py

# 5. Case Study D

Replicate the graph below from the Fly tipping incidents and actions taken national level data 2007/08 to 2018/19, publication, this is table 3.1.

logos

Use the flytipping dataset and replicate the table following GSS guidelines.

Note - We have altered the data to create the “other identified” column, added in commas and rounded the data. The raw data to replicate this can be found at this link. ENV24 - Fly tipping incidents and actions taken in England.

# Exercise

There are three solution cells here.

The first is for toyplot.

The second two contain the same code – however the X and Y location values have been adjusted for certain elements. If boxes are not aligned for you, you may wish to further play around with the numbers to do so.

If the first solution doesn’t look quite right, we recommend running the second one, and then experimenting to get the alignments working perfectly.

# Solution - These cells contain answers for the exercises.
#Run once to reveal the code.
#Run again to reveal the output. 

%load ../solutions/case_studies/case_study_d_flytipping_toyplot.py
# Solution - These cells contain answers for the exercises.
#Run once to reveal the code.
#Run again to reveal the output. 

%load ../solutions/case_studies/case_study_d_flytipping.py
# For Matplotlib Version 3.3.3
# Solution - These cells contain answers for the exercises.
#Run once to reveal the code.
#Run again to reveal the output. 

%load ../solutions/case_studies/case_study_d_flytipping_MPL_333.py