Chapter 5b - Presenting tables in matplotlib

1 Chapter 5b – Presenting Tables in Python (Using Matplotlib)


Follow along with the code by running cells as you encounter them

Chapter Overview

  1. Packages and Data
  2. Creating a Basic Table
  3. Customising Fonts
  4. Setting Cell Height, Padding and Width
  5. Column Headers
  6. Setting Titles, Subtitles and Captions
  7. GSS Styling
  8. Grouping a column
  9. Exporting Tables

Creating effective presentation data tables can be a very important element of data visualisation.

Unfortunately, makers of Python packages don’t seem to agree! Presentation tables are often given as an afterthought in packages such as Pandas and Matplotlib; with functionality designed to be “good enough” and for users to build and create their own extensible packages to build on the functionality; no one has of yet taken up that mantle.

We recommend using toyplot (covered in chapter 5a) - but this chapter covers creating tables using Matplotlib if you are unable to use that package.

At the time of writing this course (2020) we’re unable to find any majorly satisfactory solution that outputs a table following all of the elements of the ONS and GSS Style Guide.

This section of the course will attempt to get as close as possible to the desired outcome using Matplotlib’s .table() method.

If correctly formatting tables is an important element of your data visualisation I recommend looking into presenting your data using R, which has more functionality for presentation tables. The companion course “Data Visualisation in R” also contains a tables section which covers this in detail.


# 1. Packages and Data

Let’s start, as always by loading our packages and our data.

We’re using:

  • Numpy – Version 1.12.1
  • Pandas – Version 0.20.1
  • Matplotlib – Version 2.0.2 (here as the pyplot module)
  • Seaborn - Version 0.7.1

Remember you can use the .__version__ attribute (e.g np.__version__ ) to check your version.

More information about the packages is given in Chapter 1.

We’re following standard convention for nicknames, and we’ll also load the gapminder data.

import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt
from matplotlib.font_manager import FontProperties
%matplotlib inline

The magic command %matplotlib inline lowers the resolution of our images when displayed in the notebook; this helps with speed, especially on larger images but is very noticeable when we’re working with tables as they’re text based objects as it makes them look pixelated.

We can use the rcParams to raise the DPI (dots per inch) of the outputs in the notebook. Note this may mean they take slightly longer to render. To reduce this behaviour lower the DPI.

This will also work for other visualisations; we’ve not applied this before due to efficiency and speed of rendering the notebooks with large amounts of visualisations.

It’s also important to note that this needs to be in a new cell. %matplotlib inline appears to behave asynchronously and action last; overwriting the effects of the rcParams - so it needs to be in its own cell.

mpl.rcParams["figure.dpi"]= 300 # Set the DPI for outputs to 300

We’ll also load in our data.

gapminder = pd.read_csv("../data/gapminder.csv") # Read in the Data

return to menu


# 2. Creating a Basic Table

We’re going to use the gapminder data and we’re going to prepare it before we start.

As with other visualisations data preparation is a really important step.

With the .table() method we need to have our data mostly formatted before we visualise it.

Here I’m manipulating my data so that I have the population data for 3 years (1997, 2002 and 2007) for the first five countries in each continent – apart from Oceania, which only comprises of 2 countries in our dataset.

select_countries_3y_pop = (gapminder[gapminder["year"].isin([1997, 2002, 2007])] # Filter for the years
                           .filter(["country", "continent", "year", "pop"]) # .filter *selects* columns
                           .astype({"pop": "int64"}) # Supresses the Scientific Notation of the year columns
                           .pivot_table(index = ["continent","country"], 
                                        columns = "year", values = "pop") # Make the data "wider" so each year is a col
                           .reset_index() # Reset our index; so we don't have a multi index
                           .groupby("continent").head()) # Group by the continent and return the first 5 rows for each continent.

select_countries_3y_pop.columns.name = None #Removed the "year" label on the index

# View the new DataFrame
select_countries_3y_pop
continent country 1997 2002 2007
0 Africa Algeria 2.907202e+07 3.128714e+07 3.333322e+07
1 Africa Angola 9.875024e+06 1.086611e+07 1.242048e+07
2 Africa Benin 6.066080e+06 7.026113e+06 8.078314e+06
3 Africa Botswana 1.536536e+06 1.630347e+06 1.639131e+06
4 Africa Burkina Faso 1.035284e+07 1.225121e+07 1.432620e+07
52 Americas Argentina 3.620346e+07 3.833112e+07 4.030193e+07
53 Americas Bolivia 7.693188e+06 8.445134e+06 9.119152e+06
54 Americas Brazil 1.685467e+08 1.799142e+08 1.900106e+08
55 Americas Canada 3.030584e+07 3.190227e+07 3.339014e+07
56 Americas Chile 1.459993e+07 1.549705e+07 1.628474e+07
77 Asia Afghanistan 2.222742e+07 2.526840e+07 3.188992e+07
78 Asia Bahrain 5.985610e+05 6.563970e+05 7.085730e+05
79 Asia Bangladesh 1.233153e+08 1.356568e+08 1.504483e+08
80 Asia Cambodia 1.178296e+07 1.292671e+07 1.413186e+07
81 Asia China 1.230075e+09 1.280400e+09 1.318683e+09
110 Europe Albania 3.428038e+06 3.508512e+06 3.600523e+06
111 Europe Austria 8.069876e+06 8.148312e+06 8.199783e+06
112 Europe Belgium 1.019979e+07 1.031197e+07 1.039223e+07
113 Europe Bosnia and Herzegovina 3.607000e+06 4.165416e+06 4.552198e+06
114 Europe Bulgaria 8.066057e+06 7.661799e+06 7.322858e+06
140 Oceania Australia 1.856524e+07 1.954679e+07 2.043418e+07
141 Oceania New Zealand 3.676187e+06 3.908037e+06 4.115771e+06

We can create a table using the code below.

We can create the table directly from a pandas DataFrame by passing the dataframe.values attribute to cellText and our column titles to the attribute to colLabels.

Column titles here have been manually set so continent and country will have capital letters.

To keep column headers exactly as they are in the DataFrame colLabels can be set:

colLabels = DataFrame.columns

Or if they were all purely text values and just the first letter needed to be capitalised the str.title method could be used: colLables = DataFrame.columns.str.title()

This won’t work here as 1997 etc will be returned as NaN.

The additional parameters loc has been set to control the location of the table and edges to draw a grid around the cells. Matplotlib automatically generates an axis; to remove that use axes.axis(‘off’) and figure.tight_layout() will reduce the amount of white space around the table.

figure, axes = plt.subplots(figsize = (8,5)) # Set up our figure and axis

axes.axis("off") # Removes the axis from the figure

table = axes.table(cellText=select_countries_3y_pop.values, # Values we want in the cells
                   colLabels=["Continent", "Country", "1997", "2002", "2007"], # Our column headers 
                   loc="center", # Where we want our table located
                   edges="closed" ) # Draws a grid around the cells

figure.tight_layout() # Controls the amount of white space around the table

return to menu


# 3. Customising Fonts

We can customise our table through the use of table.properties() and looping over the children objects that make up our table cells.

Unfortunately the only way to change elements is to loop through each cell in our table and set values individually. As said before this isn’t necessarily the most elegant of solutions – however there’s no “simple” inbuilt method.

Here the font size, colour, and name is being set to match GSS standards.

The cell contents are also being aligned to the right. There is a parameter within axes.table() to do this as well (cellLoc="right") however this only applies to cells, rather than both cells and colLabels.

figure, axes = plt.subplots(figsize = (8,5)) # Set up our figure and axis

axes.axis("off") # Removes the axis from the figure

table = axes.table(cellText=select_countries_3y_pop.values, # Values we want in the cells
                   colLabels=["Continent", "Country", "1997", "2002", "2007"] , # Our column headers
                   loc="center", # Where we want our table located
                   edges="closed") # Draws a grid around the cells


# Customise the cell contents

# Create our items to loop through
table_props = table.properties()
table_cells = table_props["children"] 

for cell in table_cells: 
    cell.get_text().set_fontsize(14) # Size in points
    cell.get_text().set_color("black") # Colour
    cell.get_text().set_fontname("Arial") # Font name
    cell._loc = "right" # ensure contents (incl headers) are aligned right

figure.tight_layout() # Controls the amount of white space around the table

return to menu


# 4. Setting Cell Height, Padding and Width

The cell height and the cell width of the table may need to be adjusted.

The height can be adjusted using the loop created in the last cell by setting: cell.set_height(). Here 0.06 worked well, but some experimentation might be needed.

Despite being right aligned we have a large gap between the numbers and the border of the cell. By default Matplotlib uses 10% padding, but this is too much, and cell.PAD can be set to adjust it.

figure, axes = plt.subplots(figsize = (8,5)) # Set up our figure and axis

axes.axis("off") # Removes the axis from the figure

table = axes.table(cellText=select_countries_3y_pop.values, # Values we want in the cells
                   colLabels=["Continent", "Country", "1997", "2002", "2007"] , # Our column headers
                   loc="center", # Where we want our table located
                   edges="closed") # Draws a grid around the cells

# Customise the cell contents

# Create our items to loop through
table_props = table.properties()
table_cells = table_props["children"]

for cell in table_cells: 
    cell.get_text().set_fontsize(14) # Size in points
    cell.get_text().set_color("black") # Colour
    cell.get_text().set_fontname("Arial") # Font name
    cell._loc = "right" # ensure contents (incl headers) are aligned right
    cell.set_height(0.07) # Set height of cells
    cell.PAD = 0.01 # Adjust padding between data and border of cell.
    
figure.tight_layout() # Controls the amount of white space around the table

To control the width of cells a separate loop with the .auto_set_column_width() method can be used. In some versions of Matplotlib this can be run on its own; others will require it to be looped over each column in the table.

Note this loop comes above the one with cell.PAD and that the cell padding needed to be adjusted to get the desired output.

figure, axes = plt.subplots(figsize = (8,5)) # Set up our figure and axis


axes.axis("off") # Removes the axis from the figure

table = axes.table(cellText=select_countries_3y_pop.values, # Values we want in the cells
                   colLabels=["Continent", "Country", "1997", "2002", "2007"] , # Our column headers
                   loc="center", # Where we want our table located
                   edges="closed") # Draws a grid around the cells
          
# Loop over each column and auto set the width

for each_column in range(len(select_countries_3y_pop.columns)):
    table.auto_set_column_width(each_column)

# Customise the cell contents
# Create our items to loop through
table_props = table.properties()
table_cells = table_props["children"]

for cell in table_cells: 
    cell.get_text().set_fontsize(14) # Size in points
    cell.get_text().set_color("black") # Colour
    cell.get_text().set_fontname("Arial") # Font name
    cell._loc = "right" # ensure contents (incl headers) are aligned right
    cell.set_height(0.07) # Set height of cells
    cell.PAD = 0.04 # Adjust padding between data and border of cell.

figure.tight_layout() # Controls the amount of white space around the table

return to menu


# 5. Column Headers

To make headers bold and increase their font size slightly, a loop can be used to just affect the top row of the table.

figure, axes = plt.subplots(figsize = (8,5)) # Set up our figure and axis

axes.axis("off") # Removes the axis from the figure

table = axes.table(cellText=select_countries_3y_pop.values, # Values we want in the cells
                   colLabels=["Continent", "Country", "1997", "2002", "2007"] , # Our column headers
                   loc="center", # Where we want our table located
                   edges="closed") # Draws a grid around the cells

# Loop over each column and auto set the width
for each_column in range(len(select_countries_3y_pop.columns)):
    table.auto_set_column_width(each_column)
    
# Customise the cell contents
# Create our items to loop through
table_props = table.properties()
table_cells = table_props["children"]

for cell in table_cells: 
        cell.get_text().set_fontsize(14) # Size in points
        cell.get_text().set_color("black") # Colour
        cell.get_text().set_fontname("Arial") # Font name
        cell._loc = "right" # ensure contents (incl headers) are aligned right
        cell.set_height(0.07) # Set height of cells
        cell.PAD = 0.04 # Adjust padding between data and border of cell.

# Create a custom header row with larger font and bigger text
for (row, col), cell in table.get_celld().items():
    if (row == 0) or (col == -1):
        cell.set_text_props(fontproperties=FontProperties(weight="bold", size = 14))
        # Note this uses from matplotlib.font_manager import FontProperties - that we imported at the top
    
figure.tight_layout() # Controls the amount of white space around the table

return to menu


# 6. Setting Titles, Subtitles and Captions

Here I’ve set the title, subtitle and caption using the plt.figtext() methods. It’s important to note here that the suptitle() method works; but I was unable to get the title() method to behave correctly. That method would not move to the right of the plot; so has been substituted with the plt.figtext() method.

As usual with these methods you may have to play around with the position of the X and Y co-ordinates to gain the result desired.

figure, axes = plt.subplots(figsize = (8,5)) # Set up our figure and axis

axes.axis("off") # Removes the axis from the figure

table = axes.table(cellText=select_countries_3y_pop.values, # Values we want in the cells
                   colLabels=["Continent", "Country", "1997", "2002", "2007"] , # Our column headers
                   loc="center", # Where we want our table located
                   edges="closed") # Draws a grid around the cells

# Create Titles, Subtitles and Caption
title = "Table 1: Population of select countries over three years"
plt.figtext(x=0.08, y=1.3,  s=title, ha="left", fontweight="bold", fontsize=16, fontname="sans-serif")

subtitle = "Population for 1997, 2002 and 2007"
plt.figtext(x=0.08, y=1.25, s=subtitle, ha="left", fontweight="light", fontsize=14, fontname="sans-serif")

caption = "Source: Gapminder.org"
plt.figtext(x=0.95, y=-0.25, s=caption, ha="right", fontweight="light", fontsize=12, fontname="sans-serif")


# Loop over each column and auto set the width
for each_column in range(len(select_countries_3y_pop.columns)):
    table.auto_set_column_width(each_column)
    
# Customise the cell contents
# Create our items to loop through
table_props = table.properties()
table_cells = table_props["children"]
# Loop through the cells
for cell in table_cells: 
    cell.get_text().set_fontsize(14) # Size in points
    cell.get_text().set_color("black") # Colour
    cell.get_text().set_fontname("Arial") # Font name
    cell._loc = "right" # ensure contents (incl headers) are aligned right
    cell.set_height(0.07) # Set height of cells
    cell.PAD = 0.04 # Adjust padding between data and border of cell.

# Create a custom header row with larger font and bigger text
for (row, col), cell in table.get_celld().items():
    if (row == 0) or (col == -1):
        cell.set_text_props(fontproperties=FontProperties(weight="bold", size = 14))
        # Note this uses from matplotlib.font_manager import FontProperties - that we imported at the top
    
figure.tight_layout() # Controls the amount of white space around the table

return to menu


# 7. GSS Styling As said above this isn’t exactly following the GSS guidelines; but is reasonably close.

The cell below expands on the loop that changes the header row. Here we now control the colour of the header row and set the line width to 0. Setting the line width to zero effectively “hides” the edges of the cells. In axes.table() the parameter edges can be set to open, however if the edges are open they can’t be coloured – so they can be “hidden” by setting the width to 0.

We also introduce a horizontal line to act as a divider underneath the headers. As cells are treated as whole rectangles we need to use the additional code plt.axhline() to do so. This also requires this piece of code axes.axis([0,1, select_countries_3y_pop.shape[0],-1]) which is at the top of the cell to work effectively. This fixes the axis in place – and without it the bar moves unpredictably when you alter the y axis.

N.B - This doesn’t work for all versions of Matplotlib; and there doesn’t appear to be a suitable way to make this work. Suggestions are to experiment with the .annotate() method. If you come up with a better solution please let us know!

figure, axes = plt.subplots(figsize = (8,5)) # Set up our figure and axis

axes.axis("off") # Removes the axis from the figure

# Fix the axis - the horizontal line goes wandering without this.
axes.axis([0, select_countries_3y_pop.shape[1], select_countries_3y_pop.shape[0],-1])   


table = axes.table(cellText=select_countries_3y_pop.values, # Values we want in the cells
                   colLabels=["Continent", "Country", "1997", "2002", "2007"] , # Our column headers
                   loc="upper left", # Where we want our table located
                   edges="closed",) # Draws a grid around the cells - If we don't have this we can't colour them!


# Create Titles, Subtitles and Caption
title = "Table 1: Population of select countries over three years"
plt.figtext(x=0.04, y=1.1, s=title, ha="left", fontweight="bold", fontsize=16, fontname="sans-serif")

subtitle = "Population for 1997, 2002 and 2007"
plt.figtext(x=0.04, y=1.0, s=subtitle, ha="left", fontweight="light", fontsize=14, fontname="sans-serif")

caption = "Source: Gapminder.org"
plt.figtext(x=0.95, y=-0.55, s=caption, ha="right", fontweight="light", fontsize=12, fontname="sans-serif")

# Loop over each column and auto set the width
for each_column in range(len(select_countries_3y_pop.columns)):
    table.auto_set_column_width(each_column)
    
# Customise the cell contents
# Create our items to loop through
table_props = table.properties()
table_cells = table_props["children"]
# Loop through the cells
for cell in table_cells: 
    cell.get_text().set_fontsize(14) # Size in points
    cell.get_text().set_color("black") # Colour
    cell.get_text().set_fontname("Arial") # Font name
    cell._loc = "right" # ensure contents (incl headers) are aligned right
    cell.set_height(0.07) # Set height of cells
    cell.PAD = 0.04 # Adjust padding between data and border of cell.

# Create a custom header row with larger font and bigger text. Change the colour of the cells,
for (row, col), cell in table.get_celld().items():
    if (row == 0) or (col == -1):
        cell.set_text_props(fontproperties=FontProperties(weight="bold", size = 14))
        # Note this uses from matplotlib.font_manager import FontProperties - that we imported at the top
        cell.set_facecolor("white")
        cell.set_linewidth(0)
    elif (row % 2 == 0):  # Even rows - row number divided by two has a remainder of zero
        cell.set_facecolor("white")
        cell.set_linewidth(0)
    else: # Effectively odd rows
        cell.set_facecolor("#F2F2F2")
        cell.set_linewidth(0)

# Line
plt.axhline(y = 1.08, xmin = 0.02, xmax = 0.96, color = "black")

figure.tight_layout(); # Controls the amount of white space around the table

return to menu


# 8. Grouping a column

Often tables will have a “grouped” column. Rather than the first 5 rows having “Africa” in the continent column; only the first would. This can make data slightly easier to read; some people find this more attractive. This can also create the illusion of merged cells – similarly to excel. This solution is one of many ways it could be done; this method avoids multiple indexes which often don’t work well with Matplotlib.

Using the .duplicated() method on the continent column returns a Boolean series; False where it’s the first instance of the value; and True where it’s repeated.

.loc[] can then be used, passing that Boolean series as the lookup; updating the “continent” column and setting the value of the column (where True) to blank.

The data then gives the impression of “grouping” the continents together; it’s a fake impression as all that has happened is the duplicate values have become empty strings – but it works for the desired look in the final table.

grouped_continents = select_countries_3y_pop.copy()
grouped_continents.reset_index(inplace = True, drop=True) # Reset the index
duplicate_continents = grouped_continents["continent"].duplicated() # Returns "False" for first item, "True" for subsequent
grouped_continents.loc[duplicate_continents,"continent"] = ""  # Replaces "True" values with blanks
grouped_continents.head(7) # Check it out! Gives the impression of a multilevel index
continent country 1997 2002 2007
0 Africa Algeria 29072015.0 31287142.0 33333216.0
1 Angola 9875024.0 10866106.0 12420476.0
2 Benin 6066080.0 7026113.0 8078314.0
3 Botswana 1536536.0 1630347.0 1639131.0
4 Burkina Faso 10352843.0 12251209.0 14326203.0
5 Americas Argentina 36203463.0 38331121.0 40301927.0
6 Bolivia 7693188.0 8445134.0 9119152.0

Let’s check out the effect when we apply it to our previous table:

figure, axes = plt.subplots(figsize = (8,5)) # Set up our figure and axis

axes.axis("off") # Removes the axis from the figure

# Fix the axis - the horizontal line goes wandering without this.
axes.axis([0, select_countries_3y_pop.shape[1], select_countries_3y_pop.shape[0],-1])   


table = axes.table(cellText=select_countries_3y_pop.values, # Values we want in the cells
                   colLabels=["Continent", "Country", "1997", "2002", "2007"] , # Our column headers
                   loc="upper left", # Where we want our table located
                   edges="closed",) # Draws a grid around the cells - If we don't have this we can't colour them!


# Create Titles, Subtitles and Caption
title = "Table 1: Population of select countries over three years"
plt.figtext(x=0.04, y=1.1, s=title, ha="left", fontweight="bold", fontsize=16, fontname="sans-serif")

subtitle = "Population for 1997, 2002 and 2007"
plt.figtext(x=0.04, y=1.0, s=subtitle, ha="left", fontweight="light", fontsize=14, fontname="sans-serif")

caption = "Source: Gapminder.org"
plt.figtext(x=0.95, y=-0.55, s=caption, ha="right", fontweight="light", fontsize=12, fontname="sans-serif")

# Loop over each column and auto set the width
for each_column in range(len(select_countries_3y_pop.columns)):
    table.auto_set_column_width(each_column)
    
# Customise the cell contents
# Create our items to loop through
table_props = table.properties()
table_cells = table_props["children"]
# Loop through the cells
for cell in table_cells: 
    cell.get_text().set_fontsize(14) # Size in points
    cell.get_text().set_color("black") # Colour
    cell.get_text().set_fontname("Arial") # Font name
    cell._loc = "right" # ensure contents (incl headers) are aligned right
    cell.set_height(0.07) # Set height of cells
    cell.PAD = 0.04 # Adjust padding between data and border of cell.

# Create a custom header row with larger font and bigger text. Change the colour of the cells,
for (row, col), cell in table.get_celld().items():
    if (row == 0) or (col == -1):
        cell.set_text_props(fontproperties=FontProperties(weight="bold", size = 14))
        # Note this uses from matplotlib.font_manager import FontProperties - that we imported at the top
        cell.set_facecolor("white")
        cell.set_linewidth(0)
    elif (row % 2 == 0):  # Even rows - row number divided by two has a remainder of zero
        cell.set_facecolor("white")
        cell.set_linewidth(0)
    else: # Effectively odd rows
        cell.set_facecolor("#F2F2F2")
        cell.set_linewidth(0)

# Line
plt.axhline(y = 1.08, xmin = 0.02, xmax = 0.96, color = "black")

figure.tight_layout(); # Controls the amount of white space around the table

# 9. Exporting Tables

As these tables are Matplotlib objects they can be saved in the same way as other figures. In the cell below is an example; this is the same code as used in Chapter 2 – Plotting Overview. Please feel free to review the code in that section.

figure.savefig("../outputs/my_table.png", dpi=300, bbox_inches="tight")

return to menu


2 End of Chapter

You have completed the main content of the Data Visualization Course. Please continue to Chapter 6 - Case Studies. Please ensure you complete the survey on the Learning Hub

return to menu