# Load the packages
import numpy as np
import pandas as pd
import matplotlib
import seaborn as sns
Chapter 1 - Introduction
1 Introduction
This course provides an overview of data visualisation and plotting techniques in Python.
Data visualisation can be described as both an art and a science. You will approach the concept with this framing in mind.
2 Data
Note - Some users have had errors when importing seaborn. This was an error with the np.nosetester
module. This was solved by upgrading the SciPy package. To do this use the Anaconda Prompt and enter:
pip install --upgrade scipy
Staff should should follow their own internal guidance on installing packages.
2.1 Data
In this course we’ll be using a variety of data. This is stored in the “data” folder.
Gapminder contains data from a variety of years for different countries relating to several elements:
life_exp
– Life expectancy at birth in years.pop
– Population, measured every five years.gdp_per_cap
– Gross domestic product per capita in “international dollars” – a hypothetical unit of currency, equivalent to the power parity of the US dollar in 2005, in this case.infant_mortality
- Number of deaths per 1,000 in children under 1 year of age.fertility
– Number of children per woman.
We will use pd.read_csv()
to read in our data.
= pd.read_csv("../data/gapminder.csv")
gapminder
gapminder.head()
country | continent | year | life_exp | pop | gdp_per_cap | infant_mortality | fertility | |
---|---|---|---|---|---|---|---|---|
0 | Afghanistan | Asia | 1952 | 28.801 | 8425333.0 | 779.445314 | NaN | NaN |
1 | Afghanistan | Asia | 1957 | 30.332 | 9240934.0 | 820.853030 | NaN | NaN |
2 | Afghanistan | Asia | 1962 | 31.997 | 10267083.0 | 853.100710 | NaN | NaN |
3 | Afghanistan | Asia | 1967 | 34.020 | 11537966.0 | 836.197138 | NaN | NaN |
4 | Afghanistan | Asia | 1972 | 36.088 | 13079460.0 | 739.981106 | NaN | NaN |
Please explore the gapminder data before starting the course. An understanding of the data using tools such as.dtypes
, .head()
, .tail()
is vital.
3 Processing Data
In this course we’ll be using Pandas to process our data.
The code will be commented and follow PEP-8 guidelines.
A firm understanding of how to import and interact with data in Python is a pre-requisite and will not be covered in this course. You will build on those skills to develop your understanding of data visualisation concepts and how to apply them to data analysis in Python.
3.1 Visualisation Guidelines
In this course we will be following the most up to date guidelines set by the government Analysis Function. Links to training on these guidelines can be found below:
GSS Introduction to data visualisation
Analysis Function Guidance - Charts
Analysis Function Guidance - Tables
Analysis Function Guidance - Colours
Analysis Function Guidance - Infographics
Analysis Function Guidance - Examples
The following is a link to the ONS style guide, which can be useful if your organisation doesn’t have a standard for data analysis and statistics. If in doubt, check with your employer to see if one exists.
4 End of Chapter
You have completed Chapter 1 of the Data Visualisation Course. Please move on to Chapter 2.