Chapter 1 - Introduction

1 Introduction

This course provides an overview of data visualisation and plotting techniques in Python.

Data visualisation can be described as both an art and a science. You will approach the concept with this framing in mind.

2 Data

# Load the packages

import numpy as np

import pandas as pd

import matplotlib

import seaborn as sns

Note - Some users have had errors when importing seaborn. This was an error with the np.nosetester module. This was solved by upgrading the SciPy package. To do this use the Anaconda Prompt and enter:

pip install --upgrade scipy

Staff should should follow their own internal guidance on installing packages.

2.1 Data

In this course we’ll be using a variety of data. This is stored in the “data” folder.

Gapminder contains data from a variety of years for different countries relating to several elements:

  • life_exp – Life expectancy at birth in years.

  • pop – Population, measured every five years.

  • gdp_per_cap – Gross domestic product per capita in “international dollars” – a hypothetical unit of currency, equivalent to the power parity of the US dollar in 2005, in this case.

  • infant_mortality - Number of deaths per 1,000 in children under 1 year of age.

  • fertility – Number of children per woman.

We will use pd.read_csv() to read in our data.

gapminder = pd.read_csv("../data/gapminder.csv")

gapminder.head()
country continent year life_exp pop gdp_per_cap infant_mortality fertility
0 Afghanistan Asia 1952 28.801 8425333.0 779.445314 NaN NaN
1 Afghanistan Asia 1957 30.332 9240934.0 820.853030 NaN NaN
2 Afghanistan Asia 1962 31.997 10267083.0 853.100710 NaN NaN
3 Afghanistan Asia 1967 34.020 11537966.0 836.197138 NaN NaN
4 Afghanistan Asia 1972 36.088 13079460.0 739.981106 NaN NaN

Please explore the gapminder data before starting the course. An understanding of the data using tools such as.dtypes, .head(), .tail() is vital.

3 Processing Data

In this course we’ll be using Pandas to process our data.

The code will be commented and follow PEP-8 guidelines.

A firm understanding of how to import and interact with data in Python is a pre-requisite and will not be covered in this course. You will build on those skills to develop your understanding of data visualisation concepts and how to apply them to data analysis in Python.

3.1 Visualisation Guidelines

In this course we will be following the most up to date guidelines set by the government Analysis Function. Links to training on these guidelines can be found below:

GSS Introduction to data visualisation

Analysis Function Guidance - Charts

Analysis Function Guidance - Tables

Analysis Function Guidance - Colours

Analysis Function Guidance - Infographics

Analysis Function Guidance - Examples

The following is a link to the ONS style guide, which can be useful if your organisation doesn’t have a standard for data analysis and statistics. If in doubt, check with your employer to see if one exists.

Style.ons.gov.uk - Data Visualisation

4 End of Chapter

You have completed Chapter 1 of the Data Visualisation Course. Please move on to Chapter 2.