Pivot tables in SQL

Pivot tables are a means to summarise rows of data and to turn them into colums. You may see them referred to as cross-tabulations and some really, really old people can regale you with stories of the mythical proc tabulate.

A small example using python

Suppose we have data that looks like

id direction magnitude
3 left 10
3 right 20
1 right 20
2 left 70

Working in python we can easily change this to

id left right
1   20
2 70  
3 10 20

by using the following code

from collections import OrderedDict #make it look nicer
import pandas as pd

# just making a dictionary containing our stuff to push into pandas
dty = OrderedDict({
  "id": [3, 3, 1, 2],
  "direction": ["left", "right", "right", "left"],
  "mag": [10, 20, 20, 70]

# make it into a dataframe
df = pd.DataFrame(dty)

# and finally, make our pivot table
pdf = df.pivot(index="id", columns="direction", values="mag")

which will output

id  direction  value
 3       left     10
 3      right     20
 1      right     20
 2       left     70

direction  left  right
1           NaN   20.0
2          70.0    NaN
3          10.0   20.0

So it should be easy in SQL right?

Well, no. unless you have MSSQL (it has a built in pivot function).

Let’s knock together a MySQL schema for this example

create database greenindex;
use greenindex;

create table trees (
  pkey int(11) not null auto_increment,
  id int,
  direction varchar(5),
  mag int,
  primary key (pkey)

insert into trees (id, direction, mag) values (3, "left", 10);
insert into trees (id, direction, mag) values (3, "right", 20);
insert into trees (id, direction, mag) values (1, "right", 20);
insert into trees (id, direction, mag) values (2, "left", 70);

then we are basically at the point where we can try to create our cross tabulation.

create table pivot as
select id,
sum(if(direction="left", mag, null)) as leftmag,
sum(if(direction="right", mag, null)) as rightmag
from trees group by id;

select * from pivot;

which gives us (shock horror):

mysql> select * from pivot;
| id   | leftmag | rightmag |
|    1 |    NULL |       20 |
|    2 |      70 |     NULL |
|    3 |      10 |       20 |
3 rows in set (0.00 sec)

So we basically have to write an if statement for all of our different columns and name them all?


Luckily in this example there are only 3 columns in the output so we can get by with it not being utterly terrible.

Now, the real issue comes if we want to run this in an on-demand fashion. And actually, I don’t have the solution right now. Any takers to show me the way?