Chapter 1 - Introduction

1 Introduction

This course provides an overview of data visualisation and plotting techniques in Python.

Data visualisation can be described as both an art and a science. You will approach the concept with this framing in mind.

2 Data

# Load the packages

import numpy as np

import pandas as pd

import matplotlib

import seaborn as sns

Note - Some users have had errors when importing seaborn. This was an error with the np.nosetester module. This was solved by upgrading the SciPy package. To do this use the Anaconda Prompt and enter:

pip install --upgrade scipy

Staff should should follow their own internal guidance on installing packages.

2.1 Data

In this course we’ll be using a variety of data. This is stored in the “data” folder.

Gapminder contains data from a variety of years for different countries relating to several elements:

life_exp – Life expectancy at birth in years.
pop – Population, measured every five years.
gdp_per_cap – Gross domestic product per capita in “international dollars” – a hypothetical unit of currency, equivalent to the power parity of the US dollar in 2005, in this case.
infant_mortality - Number of deaths per 1,000 in children under 1 year of age.
fertility – Number of children per woman.

We will use pd.read_csv() to read in our data.

gapminder = pd.read_csv("../data/gapminder.csv")

gapminder.head()

	country	continent	year	life_exp	pop	gdp_per_cap	infant_mortality	fertility
0	Afghanistan	Asia	1952	28.801	8425333.0	779.445314	NaN	NaN
1	Afghanistan	Asia	1957	30.332	9240934.0	820.853030	NaN	NaN
2	Afghanistan	Asia	1962	31.997	10267083.0	853.100710	NaN	NaN
3	Afghanistan	Asia	1967	34.020	11537966.0	836.197138	NaN	NaN
4	Afghanistan	Asia	1972	36.088	13079460.0	739.981106	NaN	NaN

Please explore the gapminder data before starting the course. An understanding of the data using tools such as.dtypes, .head(), .tail() is vital.

3 Processing Data

In this course we’ll be using Pandas to process our data.

The code will be commented and follow PEP-8 guidelines.

A firm understanding of how to import and interact with data in Python is a pre-requisite and will not be covered in this course. You will build on those skills to develop your understanding of data visualisation concepts and how to apply them to data analysis in Python.

3.1 Visualisation Guidelines

In this course we will be following the most up to date guidelines set by the government Analysis Function. Links to training on these guidelines can be found below:

GSS Introduction to data visualisation

Analysis Function Guidance - Charts

Analysis Function Guidance - Tables

Analysis Function Guidance - Colours

Analysis Function Guidance - Infographics

Analysis Function Guidance - Examples

The following is a link to the ONS style guide, which can be useful if your organisation doesn’t have a standard for data analysis and statistics. If in doubt, check with your employer to see if one exists.

Style.ons.gov.uk - Data Visualisation

4 End of Chapter

You have completed Chapter 1 of the Data Visualisation Course. Please move on to Chapter 2.