Pivot tables are a means to summarise rows of data and to turn them into colums.
You may see them referred to as cross-tabulations and some really, really old
people can regale you with stories of the mythical proc tabulate
.
A small example using python
Suppose we have data that looks like
id | direction | magnitude |
---|---|---|
3 | left | 10 |
3 | right | 20 |
1 | right | 20 |
2 | left | 70 |
Working in python we can easily change this to
id | left | right |
---|---|---|
1 | 20 | |
2 | 70 | |
3 | 10 | 20 |
by using the following code
from collections import OrderedDict #make it look nicer
import pandas as pd
# just making a dictionary containing our stuff to push into pandas
dty = OrderedDict({
"id": [3, 3, 1, 2],
"direction": ["left", "right", "right", "left"],
"mag": [10, 20, 20, 70]
})
# make it into a dataframe
df = pd.DataFrame(dty)
print(df)
# and finally, make our pivot table
pdf = df.pivot(index="id", columns="direction", values="mag")
print(pdf)
which will output
id direction value
3 left 10
3 right 20
1 right 20
2 left 70
direction left right
id
1 NaN 20.0
2 70.0 NaN
3 10.0 20.0
So it should be easy in SQL right?
Well, no. unless you have MSSQL (it has a built in pivot
function).
Let’s knock together a MySQL schema for this example
create database greenindex;
use greenindex;
create table trees (
pkey int(11) not null auto_increment,
id int,
direction varchar(5),
mag int,
primary key (pkey)
);
insert into trees (id, direction, mag) values (3, "left", 10);
insert into trees (id, direction, mag) values (3, "right", 20);
insert into trees (id, direction, mag) values (1, "right", 20);
insert into trees (id, direction, mag) values (2, "left", 70);
then we are basically at the point where we can try to create our cross tabulation.
create table pivot as
select id,
sum(if(direction="left", mag, null)) as leftmag,
sum(if(direction="right", mag, null)) as rightmag
from trees group by id;
select * from pivot;
which gives us (shock horror):
mysql> select * from pivot;
+------+---------+----------+
| id | leftmag | rightmag |
+------+---------+----------+
| 1 | NULL | 20 |
| 2 | 70 | NULL |
| 3 | 10 | 20 |
+------+---------+----------+
3 rows in set (0.00 sec)
So we basically have to write an if statement for all of our different columns and name them all?
Yep.
Luckily in this example there are only 3 columns in the output so we can get by with it not being utterly terrible.
Now, the real issue comes if we want to run this in an on-demand fashion. And actually, I don’t have the solution right now. Any takers to show me the way?