# Load packages
import pandas as pd
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
from matplotlib import rcParams
import seaborn as sns
Data Visualisation Case Study
1 Introduction
You will use material from chapters 1-4 of the Data Visualisation in Python course to complete structured exercises on a fresh dataset.
The case study will guide you to practice using the following visualisations:
- Scatterplots
- Bar Plots
- Box Plots
Of course there are many more types of visualisation and you are encouraged to experiment should you think there are better ways to look at the data.
2 Course Format
This chapter will present each question as a section, with each part a subsection.
For question 1, the format will be:
- Tab 1 will contain the question itself
- Tab 2 will contain a hint for the question
- Tab 3 will contain the solution with code
For question 2, the format will be the same however there will be no solution, only a question and hint.
Important note - The solutions in this case study use a combination of Matplotlib and Seaborn to create visualisations. We’ve tried to reflect reality where you’d need to balance finding creative solutions with accessibility and time constraints. You may want to take your solutions in a different direction and that’s ok, as long as you take away the central message to use what works best for you, your team and your organisation.
2.1 Example
This is an example question (or is it)?
This is a hint!
Insert code here.
Throughout, we would ask you to reflect on the plots used throughout and whether or not you would make that same choice when approaching the dataset we use in the case study.
Data Visualisation is considered both an art and a science, so there are always numerous viewpoints on how best to display information.
3 Packages
Let’s start, as always by loading required packages and data.
Packages that may be needed:
- Numpy Version: 1.24.4
- Pandas Version: 1.5.3
- Matplotlib Version: 3.5.3
- Seaborn Version: 0.13.0
We may need to use the keyword import
Always follow the standard convention for nicknames.
4 Fonts
Add in a sans-serif font such as
- Arial
- Tahoma
- Helvetica
for rendering throughout the exercises, ensure it is automatically used.
We will use Arial for the solutions.
We can set these using rcParams.
"font.family"] = "sans-serif"
rcParams["font.sans-serif"] = ["Arial"] rcParams[
5 Ames Housing Data
Throughout this chapter we will be visualising the Ames Housing Dataset, an incredibly popular choice for visualisation, statistical modelling, machine learning and exploratory data analysis.
It is larger than other training datasets used which gives it a feel closer to that of real-world data.
6 Task 1 - Relationship with Sale Price
This chapter centers on the Ames housing dataset, specifically examining the sale price as the main variable of interest. You will explore the impact of other variables, analyse the distribution, and assess variations across categorical variables.
While Task 1 involves constructing a detailed scatterplot with facets for less clutter, the focus is primarily on visualizations. There will be limited emphasis on exploratory analysis already covered in Modules 3 and 5.
6.1 Task 1a
Let’s see the data first:
Read in the Ames housing dataset and clean its column names.
Look at the result.
There are packages that will do this for you, such as pyjanitor. You also learnt more programmatic methods to do this in your introductory modules.
= pd.read_csv("../data/ames.csv")
ames_data
= ames_data.columns.str.replace(pat=" ", repl="_")
ames_data.columns = ames_data.columns.str.lower()
ames_data.columns
ames_data.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2930 entries, 0 to 2929
Data columns (total 81 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 order 2930 non-null int64
1 ms_subclass 2930 non-null int64
2 ms_zoning 2930 non-null object
3 lot_frontage 2440 non-null float64
4 lot_area 2930 non-null int64
5 street 2930 non-null object
6 alley 198 non-null object
7 lot_shape 2930 non-null object
8 land_contour 2930 non-null object
9 utilities 2930 non-null object
10 lot_config 2930 non-null object
11 land_slope 2930 non-null object
12 neighborhood 2930 non-null object
13 condition_1 2930 non-null object
14 condition_2 2930 non-null object
15 bldg_type 2930 non-null object
16 house_style 2930 non-null object
17 overall_qual 2930 non-null int64
18 overall_cond 2930 non-null int64
19 year_built 2930 non-null int64
20 year_remod/add 2930 non-null int64
21 roof_style 2930 non-null object
22 roof_matl 2930 non-null object
23 exterior_1st 2930 non-null object
24 exterior_2nd 2930 non-null object
25 mas_vnr_type 1155 non-null object
26 mas_vnr_area 2907 non-null float64
27 exter_qual 2930 non-null object
28 exter_cond 2930 non-null object
29 foundation 2930 non-null object
30 bsmt_qual 2850 non-null object
31 bsmt_cond 2850 non-null object
32 bsmt_exposure 2847 non-null object
33 bsmtfin_type_1 2850 non-null object
34 bsmtfin_sf_1 2929 non-null float64
35 bsmtfin_type_2 2849 non-null object
36 bsmtfin_sf_2 2929 non-null float64
37 bsmt_unf_sf 2929 non-null float64
38 total_bsmt_sf 2929 non-null float64
39 heating 2930 non-null object
40 heating_qc 2930 non-null object
41 central_air 2930 non-null object
42 electrical 2929 non-null object
43 1st_flr_sf 2930 non-null int64
44 2nd_flr_sf 2930 non-null int64
45 low_qual_fin_sf 2930 non-null int64
46 gr_liv_area 2930 non-null int64
47 bsmt_full_bath 2928 non-null float64
48 bsmt_half_bath 2928 non-null float64
49 full_bath 2930 non-null int64
50 half_bath 2930 non-null int64
51 bedroom_abvgr 2930 non-null int64
52 kitchen_abvgr 2930 non-null int64
53 kitchen_qual 2930 non-null object
54 totrms_abvgrd 2930 non-null int64
55 functional 2930 non-null object
56 fireplaces 2930 non-null int64
57 fireplace_qu 1508 non-null object
58 garage_type 2773 non-null object
59 garage_yr_blt 2771 non-null float64
60 garage_finish 2771 non-null object
61 garage_cars 2929 non-null float64
62 garage_area 2929 non-null float64
63 garage_qual 2771 non-null object
64 garage_cond 2771 non-null object
65 paved_drive 2930 non-null object
66 wood_deck_sf 2930 non-null int64
67 open_porch_sf 2930 non-null int64
68 enclosed_porch 2930 non-null int64
69 3ssn_porch 2930 non-null int64
70 screen_porch 2930 non-null int64
71 pool_area 2930 non-null int64
72 pool_qc 13 non-null object
73 fence 572 non-null object
74 misc_feature 106 non-null object
75 misc_val 2930 non-null int64
76 mo_sold 2930 non-null int64
77 yr_sold 2930 non-null int64
78 sale_type 2930 non-null object
79 sale_condition 2930 non-null object
80 saleprice 2930 non-null int64
dtypes: float64(11), int64(27), object(43)
memory usage: 1.8+ MB
6.2 Task 1b
Consider factors influencing residential housing prices. While some patterns may be elusive, we can explore logical directions. For instance, examine the above-ground living area as a key element in addition to the overall lot area.
Create a scatter plot using sale price as the x axis and ground living area as the y axis
What are your initial impressions from the plot?
You can choose either matplotlib or a seaborn scatterplot for your solution.
# Create our figure and our axes
= plt.subplots()
figure, axes
=ames_data["saleprice"], y=ames_data["gr_liv_area"]) axes.scatter(x
6.3 Task 1c
Practice playing with colors and group the plot based on the presence of central air conditioning. Explore trends across this categorical variable, adding another dimension to the plot.
Create two plots and display them side by side:
For the first, display the points in blue
For the second, use the central air conditioning variable as the colour source
You’ll probably need to think about using a loop to apply unique values to each plot.
= ames_data["central_air"].unique().tolist() # Get a list with the unique values of the central_air column.
air_conditioning # Sorting this list to alphabetical order
air_conditioning.sort()
# Create our figure and our axes
= plt.subplots(1, 2, figsize=(10, 4))
figure, (axes1, axes2)
=ames_data["saleprice"], y=ames_data["gr_liv_area"], color = "blue")
axes1.scatter(x
# NEW - Loop through each unique value in turn
for air_con_present in air_conditioning:
# NEW - Get the subset of our data corresponding to the presence of air conditioning or not.
= ames_data[ames_data["central_air"] == air_con_present]
air_con_rows
=air_con_rows["saleprice"],
axes2.scatter(x=air_con_rows["gr_liv_area"],
y=air_con_present) # NEW - Gives each value a label for our legend
label
# Show the Legend
; plt.legend()
6.4 Task 1d
You can tweak the colours used in line with accessibility guidelines
Start by copying over the code where colour was mapped to the central air variable.
Use the Palette table in the Analysis Function Guidance on Colours to add the following colours to your plot via the HEX CODES.
- Dark blue
- Orange
Think about creating a pallete variable using the colours you want and then applying later on.
= ames_data["central_air"].unique().tolist()
air_conditioning
air_conditioning.sort() = ["#12436D","#F46A25"]
my_air_con_palette
= zip(air_conditioning, my_air_con_palette)
air_con_palette_zip
= plt.subplots(figsize=(10, 4))
figure, axes
for air_con_present, colour in air_con_palette_zip:
= ames_data[ames_data["central_air"] == air_con_present]
air_con_rows
=air_con_rows["saleprice"],
axes.scatter(x=air_con_rows["gr_liv_area"],
y=colour,
c=air_con_present)
label
# Show the Legend
; plt.legend()
6.5 Task 1e
Add some lines of best fit.
Add a smooth line of best fit (linear) through the points in your scatter plot.
Have a look at how this can be done in matplotlib and seaborn. Choose the simplest solution.
Here we’ve switched to Seaborn because the Matplotlib solution to this is……well, google “matplotlib scatter line of best fit” and see for yourself.
This is a noisy plot, however it shows that there may be a difference in how much the overall area of the property impacts the price, based on whether there is central air conditioning or not.
It seems that we have a much steeper increase when there isn’t, compared to when there is.
#Create custom palette
= ["#12436D","#F46A25"]
my_air_con_palette
# Create a seaborn scatter with a custom palette.
= sns.lmplot(x = "saleprice", y = "gr_liv_area", data = ames_data,
plot = True,
fit_reg = "central_air",
hue = sns.color_palette(my_air_con_palette)) palette
6.6 Task 1f
What if you just want to see the overall trend, as opposed to the trend by group?
Modify the code from part e to produce a single line of best fit through the points, as opposed to one grouped by central air conditioning.
Comment on the plausibility of a correlation here.
There’s a solution to this in seaborn using an additional regplot to hold the previous plot.
#Create custom palette
= ["#12436D","#F46A25"]
my_air_con_palette
# Create a seaborn scatter with a custom palette
= sns.lmplot(x = "saleprice", y = "gr_liv_area", data = ames_data,
plot = False,
fit_reg = "central_air",
hue = sns.color_palette(my_air_con_palette))
palette
= "saleprice", y = "gr_liv_area", data = ames_data, scatter=False, ax=plot.axes[0,0]) sns.regplot(x
6.7 Task 1g
Notice the plot has many overlapping points. Split it to analyse the two groups with less clutter.
Use facets to subset the plot by the central air variable.
There are several ways to create multiple plots using either matplotlib or seaborn. Use the code in Chapter 1 as a guide.
There is a much wider distribution of sale prices that have central air conditioning. The properties without air conditioning have a less strong positive relationship than those without it.
#Create custom palette
= ["#12436D","#F46A25"]
my_air_con_palette
# Create a seaborn scatter with a custom palette
= sns.lmplot(x = "saleprice", y = "gr_liv_area", data = ames_data,
plot = True,
fit_reg = "central_air",
hue = 'central_air',
col = 2,
col_wrap= sns.color_palette(my_air_con_palette)) palette
6.8 Task 1h
Added dimensions can be added to your plot by grouping based on whether the property has a paved driveway
Expand on part g by producing a plot for each combination of paved_drive and central_air.
You need to produce a 3 * 2 grid of subplots
#Create custom palette
= ["#12436D","#F46A25"]
my_air_con_palette
# Create a seaborn scatter with a custom palette
= sns.lmplot(x = "saleprice", y = "gr_liv_area", data = ames_data,
plot = True,
fit_reg = "central_air",
hue = 'paved_drive',
col ='central_air',
row= sns.color_palette(my_air_con_palette)) palette
6.9 Task 1i
While part h is a useful exercise to show you some of the tools available in matplotlib and seaborn, it is very cluttered and there are some issues with the axis scales and labels.
This makes drawing conclusions more complicated.
Let’s go back and tweak some of the formatting .
Add a suitable title and axes labels to your Scatter Plot.
You’ll need to bring in some matplotlib if you’ve used seaborn to access the underlying methods for setting titles and labels.
As long as your labels are appropriate, then any combination of them is fine.
#Create custom palette
= ["#12436D","#F46A25"]
my_air_con_palette
# Create a seaborn scatter with a custom palette
= (sns.lmplot(x = "saleprice", y = "gr_liv_area", data = ames_data,
plot = True,
fit_reg = "central_air",
hue = 'central_air',
col = 2,
col_wrap#facet_kws = dict(sharex=False, sharey=False),
= sns.color_palette(my_air_con_palette))
palette "{col_name}", loc = "left" )
.set_titles("")
.set_titles(
)
"Price vs. above ground living area split by existence of air conditioning",
plot.fig.suptitle(=16, y = 1.05, x = 0.31)
size
= plot.axes
axes
for ax in axes:
"Sale price ($)")
ax.set_xlabel("Living Area (sq. ft.) ") ax.set_ylabel(
6.10 Task 1j
Let’s investigate the x axis.
What issues are there with the x axis at present?
Modify the x axis to include the upper limits of the sale prices.
Think about the upper and lower limits, do they make sense and are they informative?
The x axis has a couple of key problems for best practice.
It doesn’t quite start from 0
The upper limit on properties is not marked on the x axis, meaning it’s difficult to interpret what the largest data points refer to.
Let’s adjust the right and left limits of the x axis
#Create custom palette
= ["#12436D","#F46A25"]
my_air_con_palette
#Find max price for upper x limit
= max(ames_data['saleprice'])
max_price
# Create a seaborn scatter with a custom palette
= (sns.lmplot(x = "saleprice", y = "gr_liv_area", data = ames_data,
plot = True,
fit_reg = "central_air",
hue = 'central_air',
col = 2,
col_wrap#facet_kws = dict(sharex=False, sharey=False),
= sns.color_palette(my_air_con_palette))
palette "{col_name}", loc = "left" )
.set_titles("")
.set_titles(
)
"Price vs. above ground living area split by existence of air conditioning",
plot.fig.suptitle(=16, y = 1.05, x = 0.31)
size
= plot.axes
axes
for ax in axes:
=round(max_price,-5),left=0) # uses round and negative number to increase right limit to upper 10000
ax.set_xlim(right"Sale price ($)")
ax.set_xlabel("Living Area (sq. ft.) ")
ax.set_ylabel(=True) # Pop in some gridlines for good measure
ax.grid(visible
# This introduces some issues with the distance between the subplots so let's fix that
=0.1) plt.subplots_adjust(wspace
6.11 Task 1k
Let’s investigate the y axis this time.
What issues are there with the y axis at present?
Modify the y axis to include proper upper and lower limits.
What about the top and bottom of the axis? Similar to the previous question but using a different method.
The y axis has similar problems to the x axis.
Let’s adjust the top and bottom limits of the x axis
#Create custom palette
= ["#12436D","#F46A25"]
my_air_con_palette
#Find max price for upper x limit
= max(ames_data['saleprice'])
max_price = max(ames_data['gr_liv_area'])
max_floor_space
# Create a seaborn scatter with a custom palette
= (sns.lmplot(x = "saleprice", y = "gr_liv_area", data = ames_data,
plot = True,
fit_reg = "central_air",
hue = 'central_air',
col = 2,
col_wrap#facet_kws = dict(sharex=False, sharey=False),
= sns.color_palette(my_air_con_palette))
palette "{col_name}", loc = "left" )
.set_titles("")
.set_titles(
)
"Price vs. above ground living area split by existence of air conditioning",
plot.fig.suptitle(=16, y = 1.05, x = 0.31)
size
= plot.axes
axes
for ax in axes:
=round(max_price,-5),left=0) # uses round and negative number to increase right limit to upper 10000
ax.set_xlim(right=round(max_floor_space,-3),bottom=0) # uses round and negative number to increase right limit to upper 10000
ax.set_ylim(top"Sale price ($)")
ax.set_xlabel("Living Area (sq. ft.) ")
ax.set_ylabel(=True) # pop in some gridlines for good measure
ax.grid(visible
=True, bottom=True)
sns.despine(left
# This introduces some issues with the distance between the subplots so let's fix that
=0.1) plt.subplots_adjust(wspace
6.12 Task 1l
Your final tweaks to make are to turn off the x axis grid lines.
- Remove the x axis gridlines
Add further parameters to the ax.grid() method.
There. We’ve sufficiently tidied up our plot so that it is easier to interpret and visually more appealing, following some of the ONS guidance as much as we can.
This is just the start. If you want your visualisations to be production ready then you’ll need to work collaboratively with colleagues and follow the guidelines of your organisation to determine minimum standards.
#Create custom palette
= ["#12436D","#F46A25"]
my_air_con_palette
#Find max price for upper x limit
= max(ames_data['saleprice'])
max_price = max(ames_data['gr_liv_area'])
max_floor_space
# Create a seaborn scatter with a custom palette
= (sns.lmplot(x = "saleprice", y = "gr_liv_area", data = ames_data,
plot = True,
fit_reg = "central_air",
hue = 'central_air',
col = 2,
col_wrap#facet_kws = dict(sharex=False, sharey=False),
= sns.color_palette(my_air_con_palette))
palette "{col_name}", loc = "left" )
.set_titles("")
.set_titles(
)
"Price vs. above ground living area split by existence of air conditioning",
plot.fig.suptitle(=16, y = 1.05, x = 0.31)
size
= plot.axes
axes
for ax in axes:
=round(max_price,-5),left=0) # uses round and negative number to increase right limit to upper 10000
ax.set_xlim(right=round(max_floor_space,-3),bottom=0) # uses round and negative number to increase right limit to upper 10000
ax.set_ylim(top"Sale price ($)")
ax.set_xlabel("Living Area (sq. ft.) ")
ax.set_ylabel(=True , which = "major", axis="y", color = (0.745, 0.745, 0.745)) # Pop in some gridlines for good measure
ax.grid(visibleTrue)
ax.set_axisbelow(
=True, bottom=True)
sns.despine(left
# This introduces some issues with the distance between the subplots so let's fix that
=0.1) plt.subplots_adjust(wspace
7 Task 2 - Exploring continuous and categorical variables
In this section, we’ll delve into exploring various dataset variables—an essential first step in Exploratory Data Analysis (EDA). EDA involves the initial investigation of our data, guiding subsequent cleaning and analysis.
Further insights into EDA were covered in Module 3 - Statistics. Given the significance of data visualis-ation in this process, practical exercises are beneficial here.
Specifically for this question, our focus is on:
- Generating additional geometric objects.
- Formatting them while addressing unique characteristics.
- Adhering to best practice guidelines.
Cruicially, this section does not have code solutions, only hints. You are on your own with this and it is your job to investigate the nature of the dataset using the skills you’ve developed.
7.1 Task 2a
Let’s start with a categorical variable that would likely contribute to house prices, such as garage type. This has 7 levels:
- 2Types - More than one type of garage.
- Attchd - Attached to the home.
- Basment - Basement garage.
- BuiltIn - Built-in garage.
- CarPort - Car port garage.
- Detchd - Detached garage.
- NA - No garage.
Count the number of properties for each garage type.
Break this down further by outputting the number of properties per garage type by the foundational material used to build the property (foundation variable), which has six levels:
- BrkTil - Brick and Tile
- CBlock - Cinder Block
- Poured Concrete
- Slab - Slab
- Stone - Stone
- Wood - Wood
We can solve both of these problems with groupby
7.2 Task 2b
It is always important to have summary tables from part a, especially for accessibility purposes (if you can’t provide useful alternative text, then the data used to create the plot is the bare minimum).
It would, of course, be easier to visualise the results in a bar chart, where we can draw initial conclusions at a glance.
Create a bar chart to count the number of properties per garage type, ensuring to:
- Colour with the best practice blue #12436D
- Rotate the plot 90 degrees to the right.
Both matplotlib and seaborn have barplots with the ability to reorient the bars to be horizontal
7.3 Task 2c
It would be easier to observe the results if we ordered the bars in descending order.
Sort the y axis to be in descending order.
Comment on what the plot shows us.
Use pandas to group the data, sort values and then matplotlib/seaborn to plot the results
7.4 Task 2d
Investigate the dataset to see if there are any missing values in this category. We don’t want to remove them as they are an intended level of the garage_type variable. How could we identify and deal with them?
Go back and investigate to see whether the original dataset has any missing values and how these are coded. Reproduce the grouped data to include these and see how this impacts the chart.
Think about how pandas deals with missing values when importing datasets. You might want to interrogate the data outside python to see what unique values this field has and compare with what has been read in.
The groupby method has the functionality to drop/include missing values.
7.5 Task 2e
Let’s ensure our plot adheres to best practice guidelines.
Modify the y axis ticks to ensure we have more gridlines.
Apply as many of the principles you used in the previous section the plot.
Amend the plot to remove the y axis gridlines and horizontally adjust the axis title by 1.
Add a title, subtitle and axes labels to the plot.
Much of the syntax is the same as when you created scatterplots. The only real differences are with the barplot specific methods and the results of changing certain elements of the axes.
7.6 Task 2f
You now have a great plot at the surface level, but what if you want to break this down by the material used in its foundation?
This let’s you see if the attached and detached garage properties were also produced with high quality materials.
Count the number of garage types per foundation material.
Create a stacked bar chart.
Tweak the title in line with what the plot now shows.
What does this plot show us?
Use groupby in pandas to include an additional variable.
Stacked bar charts can be created in either matplotlib or seaborn. You’ve already learnt how to edit the different elements of a plot.
7.7 Task 2g
Let’s look at a continuous variable this time, the best being the sale price itself!
A great place to start with any dataset (particularly if there is a variable of interest to predict, observe relationships with etc) is to investigate the distribution of its continuous variables.
These allow us to confirm if the data is skewed due to outliers (values outside of the range of the data) and thus if clipping variables would be a useful cleaning step.
Produce a histogram to observe the distribution of the sale price variable.
What are your initial thoughts upon observing this plot?
As with other types of plot, investigate how to create a histogram in either matplotlib or seaborn.
7.8 Task 2h
Let’s play around with different bin sizes.
Create three plots, one with the default bins, one with 20 bins and another with 40. Arrange them as best you see fit.
Does changing the number of bins impact our ability to the distribution of this specific variable?
Think back to previous solutions to creating multiple plots in either matplotlib or seaborn
7.9 Task 2i
Let’s apply some best practice tweaks to our histogram.
Adjust the width of the bins to 1, set the outline of the bars to white, the bar colour to “#12436D” and transparency to 0.8.
Modify the continuous y scale to produce more gridlines.
Modify the continuous x scale to include an upper limit.
Set appropriate labels for the plot.
Style it how you like and turn off the x axis gridlines.
Use previous parts of the case study and the training materials to apply these tweaks
7.10 Task 2j
Histograms tend to be used for single continuous variables. What if we wanted to compare the distribution across the groups of a categorical variable?
Let’s return to our thoughts earlier on whether properties with specific garage types have a price spread or a more common pattern.
What two methods could we use to observe the sale price distribution across the different garage types?
Think about how to create a boxplot, with the sale price distribution split between garage types. This gives summary statistics at a glance, averages and quartiles, as well as explicit markers where outliers are likely to be.
7.11 Task 2k
Let’s see how the distribution of sale price differs by garage type.
Create a boxplot to compare the distribution of the sale price across the different garage types. Again you can choose matplotlib or seaborn
Apply any tweaks you think are appropriate
Comment on the results.
Think about features specific to a boxplot. Should you remove/modify outliers? Is the boxplot size proportional to the sample size?
At the very least just get the boxplot working so you understand what it’s showing you.
7.12 Task 2l
You probably can’t produce a publication ready visualisation in this case study, however you can improve the accessibility of the plot so that colleagues can review it against organisational guidelines.
Apply the following tweaks to your boxplot:
Remove the outlier shape and colour changes, it is recommended we use filled black circles. Reduce the size to 2.5.
Use all of the colours from the Analysis Function guidance to fill the boxplot.
Modify the y axis if necessary.
Add appropriate labels to the plot.
Turn off the legend if present and remove x axis gridlines.
All of the above has either been covered previously in the case study, training materials or can be solved by looking at the matplotlib or seaborn documentation.
8 Summary
Congratulations on finishing the case study exercises. By working through it, you should feel more comfortable with using matplotlib and seaborn to create and customise them to align with analysis function guidelines.
You have a few options from here:
- Proceed to investigate the optional reference material.
- Read deeper into the topic using online references at https://matplotlib.org/ and https://seaborn.pydata.org/index.html.
- Experiment with alternative visualisation packages such as plotly (https://plotly.com/python/)