import pandas as pd
Chapter 12 - Functions
1 Chapter Overview and Learning Objectives
Packages
Functions
- Basic structure
- Named parameters
- Default Values
- Scope
- Docstrings
- Indentation
Lambda Functions
- Anonymous Functions
- Syntax
- Functions within Functions
Functions for DataFrames
- Example
- Better Practice with Functions
2 Packages and Datasets
2.1 Packages
In this chapter we will use pandas, and give it the nickname “pd”. It was written using Pandas version 1.5.3, Python Version 3.8.3 and Anaconda 2020.07. Note that Python is generally versatile with backwards and forwards compatability.
2.1.1 Exercise
Import the package pandas and give it the well known nickname “pd”.
2.2 Datasets
In this course we’ll be using the following datasets:
variable name | file name |
---|---|
animals | animals.csv |
titanic | titanic.xlsx |
2.2.1 Exercise
Load in the datasets mentioned above using the “pd.read_” family of functions.
= pd.read_csv("../data/animals.csv")
animals = pd.read_excel("../data/titanic.xlsx") titanic
3 Functions
Throughout the past 11 chapters, we have used an enormous number of built-in functions that come as a part of base Python, or its many libraries such as pandas (pd), numpy (np) and more. However, you must have thought up to now what it would be like to write functions from scratch (as all of these once were!) to be able to perform bespoke tasks for your specific workflow.
It is important to note that we should always check the available functions first before proceeding to write your own, as you may end up wasting valuable project time. In many cases though, there is no alternative and we need specific functions to perform extremely specific tasks, which is very common in industry and is one of the most in demand skills for Data Science in the modern day.
To break down the topic, functions within Python generally fall into three categories (we will focus on User Defined and Lambda Functions in this chapter):
Type | Description |
---|---|
Built in Functions | These are built into Python and always available for use Examples – “pow()”, “print()”, “help()” |
User Defined Functions | Created by users to carry out specific tasks. Declared using the “def” keyword. |
Anonymous Functions – “lambda functions” | User defined - generally one line functions used within a larger piece of code. |
We will begin by discussing User Defined Functions (UDFs). The code inside of these is not run (executed) until we call or run the function. If your code has correct syntax but produces errors this will not be clear until the code is called and executed.
These functions allow us to organise chunks of reusable code. This can help optimise our code and create blocks of code dedicated to performing a specific procedure or task (such as a cleaning routine that is applied each time!). Functions can help us to write code that is consistent, readable, maintainable and reproducible.
3.1 Structure of a Function
The syntax is as follows:
def my_function_name(parameter_1, parameter_2,…):
function_actions
…
return function_output
Let’s break down the syntax a little bit!
Our functions start with the keyword “def” (define) with syntax highlighting, which is followed by our function name. This:
- Must start with a letter or an underscore,
- Should be lowercase (by python convention),
- Should be short and descriptive,
- Can’t have the same name as a python keyword,
- Shouldn’t have the name of an existing function (it will overwrite it locally).
Following the function name is brackets. These can contain any necessary parameters that the function will take. It is finished with a colon, the same as our loops and control flow from Chapter 10.
The code below is indented (by 4 spaces, 1 tab or automatically) and is what is executed when the function is called.
The keyword “return” will return whatever value is after it. When the return statement is executed the function will stop at that line and return the given value. No code after the return statement will be executed.
Note that the keyword return is not necessary for the code to work; but any function without a return statement will return a value of “None”, it will produce no output for future code to work with. This is important because we often assign the output of functions to variables for use within more functions and so on (this is usually how a Data Pipeline is formed!).
3.1.1 Example
Here is a function that adds together two values and returns their sum, this is a good showcase of the structure of a function and how they use the parameters you give them. Note that we must run the cell to declare the function into our environment, it can then be used repeatedly.
# Function to add two values
def add_two_values(value_1, value_2):
= value_1 + value_2
total return total
# I can now run my function
1, 2) add_two_values(
3
Writing functions really allows you to appreciate and understand the processes we use in Data Science, as you are essentially “generalising” the process into a repeatable routine that will save you alot of time coding.
3.1.2 Exercise
- Remember the exercise in Chapter 10 where you converted from Celsius to Fahrenheit? Let’s create a function that does this for us, which we can reuse! Using the equation below, create a function that performs this task.
\(C= \frac{5}{9}(F - 32)\)
Expected outputs (these are rounded, you don’t need to worry about that for now):
fahrenheit_to_degrees_celsius(32) returns: 0
fahrenheit_to_degrees_celsius(11) returns: -11.7
fahrenheit_to_degrees_celsius(81.3) returns: 27.4
- Update your function so it rounds to 1 decimal place.
# (a)
def fahrenheit_to_degrees_celsius(degrees_f):
= (5 / 9) * (degrees_f - 32)
degrees_c return degrees_c
=81.3) fahrenheit_to_degrees_celsius(degrees_f
27.38888888888889
# (b) Way 1
def rnd_fahrenheit_to_degrees_celsius(degrees_f):
= (5 / 9) * (degrees_f - 32)
degrees_c return round(degrees_c, 1)
81.3)
rnd_fahrenheit_to_degrees_celsius(# Note the round() function has to be in the return statement as the rounded value is what we wish to be output!
# Way 2 - Assigning the rounded value to a variable
# def rnd1_fahrenheit_to_degrees_celsius(degrees_f):
# degrees_c = (5/9) * (degrees_f- 32)
# rounded_c = round(degrees_c, 1 )
# return rounded_c
#rnd1_fahrenheit_to_degrees_celsius(81.3)
27.4
3.2 Named parameters
We called the parameters “value_1” and “value_2” in the previous examples, where we could expect to pass two values to be added together, as the function states. You may see similar functions written like this:
def add_two_values(x, y): total = x + y return total
However, “x” and “y” as argument names are not very descriptive, especially if we were writing a more complicated function. As we have seen through the course we can also use the parameter names here when calling the function and pass the arguments using an = sign.
3.2.1 Examples
=3, value_2=10) add_two_values(value_1
13
As we’ve talked about previously using named parameters means we can pass them in any order we please.
=10, value_1=3) add_two_values(value_2
13
3.3 Default Values
We can also set default values when we declare a function so that when we run a function without any parameters it uses these preset values as its inputs. This is incredibly useful for more complex functions that go beyond mathematical operations (of course, if you wanted to always add 1 to something, maybe having a default value of 1 would be useful!).
3.3.1 Examples
Here I’ve set “value_1=10” and “value_2=5”, so unless we change them by assigning them ourselves, they will be these by default.
def add_two_values(value_1=10 , value_2=5):
= value_1 + value_2
total return total
add_two_values()
15
Now if I just pass “value_2” it will use the default for “value_1” and add it to my specified “value_2”.
=15) add_two_values(value_2
25
3.3.2 Exercise
Create a function that has three arguments where we:
- Compute the square of the first number and add that to the second number
- Divide the result by the third number
Ensure to use named parameters and set the defaults as 1 for each parameter. Make sure rounding to 2dp is involved to deal with recurring decimals.
Expected output:
- function(2, 3, 4) = (4 + 3)/4 = 7/4 = 1.75
- function(2, 5, 3) = (4 + 5)/3 = 3
- function() = (1 + 1)/1 = 2
- function(val_2 = 5, val_3 = 2) = (1 + 5)/2 = 3
def square_sum_divide(val_1 = 1, val_2 = 1, val_3 = 1):
= ((val_1 ** 2) + val_2) / val_3
expression return round(expression, ndigits = 2)
# Use 1 - default
square_sum_divide()
2.0
= 5, val_3 = 2)
square_sum_divide(val_2
# Use 3 - Fully (Don't need to specify names but good practice dictates you should)
3.0
2, 3, 4) square_sum_divide(
1.75
3.3.3 Aside - Datatypes
One last aside that I wish to make is that the datatypes of what we pass into the function don’t matter, provided that the routine inside of the function is applicable for that datatype. For example, if we use lists as an input, we must be utilising functions, methods, attributes etc applicable to lists inside it.
3.3.4 Example 1 - For Loop
Let’s see an example where we pass a list into a function as it’s input and use a for loop to print the elements!
# Let's use a list inside the function
def print_list(ls):
for element in ls:
print(element)
# Notice the multiple indents when we use control flow inside the function! We also don't need the return keyword as we are just printing here!
1, 2, 3, 4]) print_list([
1
2
3
4
I wonder if we can use list comprehensions? Of course we can!
3.3.5 Example 2 - List Comprehension
Let’s use a list comprehension to double each value in a list to create a new one, output by the function!
# Use list comprehension within a function
def create_double_my_list(ls):
= [(ls * 2) for ls in ls]
double_list return double_list
# Try it out
1, 3, 5, 7]) create_double_my_list([
[2, 6, 10, 14]
3.4 Scope
It’s important to notice here that the variables I create within my functions, like “total” are not available outside of my function. This has to do with scope, which is one of the most important concepts to understand when creating your own functions. There are two types, namely global scope and local scope.
Variables with global scope are visible and can be accessed anywhere (they are in the general environment), however ones with local scope are only visible and can only be used within the local area. When we create a new function we effectively create a new local scope, so the variables we declare within our functions are only available within that local scope.
Trying to return the value stored under “total” will return a “NameError”. Effectively “total” is created when the function is run, and then cleaned up and removed by Python after completion. This makes “return”ing variables very important, as we are effectively returning them from the local scope, to the global one.
This also relates to why we often assign the outputs of functions to variables, storing them in the global environment under the name/label we choose.
#total
3.5 Docstrings
We heavily promote looking at docstrings (which means document strings), the inbuilt “help” documents in our courses. They’re very useful for finding out what a function or method does, what our parameters are called, and what we should expect to be passed as arguments.
Including our own docstrings in a function makes our code the most accessible and readable it can be, and as we often write multi-use functions who’s use encompasses more than just us, this is an important practice.
3.5.1 Example
We start a docstring by using three sets of speech marks to surround the information we want to include.
For this function it is total overkill, but an excellent way to see this in practice!
def add_two_values(value_1, value_2) :
"""
This function will add together two values
Parameters
----------
value_1 : The first value to add
value_2 : The second value to add
Returns
-------
total: The sum / concatenation of the two values specified.
Examples
--------
add_two_values(1, 2)
returns 3
add_two_values(4.7, 3.2)
returns 7.9
add_two_values("Hello", "World")
returns "HelloWorld" (The + symbol concatenates strings)
Errors will occur if adding together strings and numerics, unless you use str() around the numeric.
"""
= value_1 + value_2
total return total
We can now access our docstring in the same ways we do for inbuilt functions:
help(add_two_values)
Help on function add_two_values in module __main__:
add_two_values(value_1, value_2)
This function will add together two values
Parameters
----------
value_1 : The first value to add
value_2 : The second value to add
Returns
-------
total: The sum / concatenation of the two values specified.
Examples
--------
add_two_values(1, 2)
returns 3
add_two_values(4.7, 3.2)
returns 7.9
add_two_values("Hello", "World")
returns "HelloWorld" (The + symbol concatenates strings)
Errors will occur if adding together strings and numerics, unless you use str() around the numeric.
This may seem laborious, but you will be all the better coder for it as it allows you to check that you know what your function is doing, what it’s parameters are and their datatypes and so on! Remember that we would never be able to understand the ins and outs of pandas functions like “groupby()” and its many parameters without these docstrings.
It is worth noting that many organisations have their own guidelines on how to write docstrings and readable, reproducible code in general. This section should be seen as a showcase of the technique, rather than the best way to structure one.
3.5.2 Exercise
In the previous exercise you created the “sum_square_divide” function. Recreate it here but with a docstring including the following:
- Definition
- Parameters (and their default values)
- What it returns
- One example of the default
- One example with no named parameters
- One example with two of the three parameters (with names)
def square_sum_divide(val_1 = 1, val_2 = 1, val_3 = 1):
"""
This function will square the first value (raise to the power of 2) and add it to the second value. Then this resultant sum will be divided by the third value.
Parameters
----------
val_1: A numeric value with default value 1.
val_2: A numeric value with default value 1.
val_3: A numeric value with default value 1.
Returns
-------
expression: The formula ((val_1 ** 2) + val_2)/val_3 is computed.
Examples
--------
square_sum_divide()
returns 2
square_sum_divide(2, 3, 4)
returns 1.75
square_sum_divide(val_2 = 5, val_3 = 2)
returns 3
Note that we cannot use strings in this function as they are incompatible with exponentiation and division.
"""
= ((val_1 ** 2) + val_2) / val_3
expression return round(expression, ndigits = 2)
3.6 Helpful Indentation hint!
The indentation of code in functions and loops is incredibly important! If we’re copying code from other places (like Stack Overflow for example!) it can often be not indented correctly to run in our new function.
If you highlight the code you wish to “move” and press:
- tab to increase the indent one “level”
- shift + tab will decrease the indent one level.
These can really help you solve the problem quickly!
3.6.1 Example
This cell will give an error, so you will need to uncomment the code to run it and observe this.
# This cell will give an error - our indentation is off!
#def whos_the_best(print_x_times):
#for each_num in range(print_x_times):
#print("Jake is a queen!")
# Use the function
#whos_the_best(10)
This will give us an error! However, if we highlight everything from the “for” line down and hit tab we get a working function which prints out a very important statement!
def whos_the_best(print_x_times):
for each_num in range(print_x_times):
print("Jake is a queen!")
# Use the function
10) whos_the_best(
Jake is a queen!
Jake is a queen!
Jake is a queen!
Jake is a queen!
Jake is a queen!
Jake is a queen!
Jake is a queen!
Jake is a queen!
Jake is a queen!
Jake is a queen!
3.7 Larger Exercise
Let’s return to the FizzBuzz exercise from Chapter 10 and generalise it, as it previously worked over a hard coded number of values (1-30). Here I’d like you to
Create a function so the user can find fizzbuzz values for any range of numbers
If you’ve used a for loop an example could be 1 to 100,
If you’ve used a while loop your top number could be 100.
Make sure you include a docstring (Don’t worry about a full example, just give individual numbers).
Remember to go back to Chapter 10 and check the End of Chapter Exercise for a reminder!
def fizz_buzz(my_iterable):
"""
fizz_buzz takes an iterable and applies some control flow:
if numbers are divisible by 3 and 5 and returns the value of "FizzBuzz"
if numbers are divisible by 3 return "Fizz"
if numbers are divisible by 5 return "Fizz"
else print out the number
Parameters
----------
my_iterable: An iterable of integers, such as a list, tuple etc.
Returns
-------
An output for each value in the iterable in accordance with the control flow established in the definition.
Examples
--------
fizz_buzz([15, 6, 10, 2])
returns
FizzBuzz
Fizz
Buzz
2
"""
for each_number in my_iterable:
if (each_number % 3 == 0) & (each_number % 5 == 0) :
print("FizzBuzz")
elif (each_number % 3) == 0 :
print("Fizz")
elif (each_number % 5) == 0 :
print("Buzz")
else:
print(each_number)
# Run the function - I'm passing a iterable using the range() function (which is exclusive!)
range(1, 31)) fizz_buzz(
1
2
Fizz
4
Buzz
Fizz
7
8
Fizz
Buzz
11
Fizz
13
14
FizzBuzz
16
17
Fizz
19
Buzz
Fizz
22
23
Fizz
Buzz
26
Fizz
28
29
FizzBuzz
4 Lambda Functions
Sometimes we have need of functions that are very small and perform one express purpose to a given argument. As such it is laborious and inefficient to create a large, well written function with a docstring for something as small as:
- Printing out a value
- Squaring a value
Which can also take place inside of another function that we will show here. It may not seem apparent yet why these are useful, but I will show some examples that will illustrate the use of this method soon. For comparison purposes, their utility is similar to comprehension from Chapter 11, where they allow you to encompass a large process in one line of code, but quickly become unmanageable for larger tasks (where a UDF would be preferred).
4.1 Structure of a Lambda Function
Syntax wise, the lambda function is built as follows:
lambda arguments : expression
We use the “lambda” keyword (with syntax highlighting) to begin the definition of the anonymous function.
We follow this keyword with the argument(s), which can be as simple as x or y here. A colon follows this “:”.
Lastly is the expression itself, where we specify what we will do to the argument(s).
This is often assigned to a throwaway variable like “x” (especially when they are used inside of another function).
4.1.1 Examples
Here let’s create a lambda function that prints the value given to it.
# Smallest lambda function - print the value
= lambda a : print(a)
x
# Examples
4)
x("Jake")
x(9.7) x(
4
Jake
9.7
Saying this in words, just like comprehension from the previous chapter demystifies the syntax. “We are creating an object x, which is an anonymous function acting on a, where it prints a.”
What about using mathematical operations?
# Adding values in a lambda function
= lambda a : a + 12
y
# Examples
10)
y(109) y(
121
We can also define multiple arguments and combine them together!
# Multiple values in a lambda function
= lambda a, b : (a + b)/2
z
# Examples
10, 12)
z(20, 22) z(
21.0
4.2 Exercise
Remember our lovely problem from before? Squaring the first number, adding it to the second and then dividing the sum by the third? Let’s create a lambda function that does just that! Make sure it rounds to 2 decimal places as well.
Use the examples (1, 2, 3) and (23, 24, 25) to check it.
Don’t worry I haven’t run out of ideas, I just see the function in consistency throughout these courses.
# Lambda function computing an expression
= lambda a, b, c : round(((a ** 2) + b)/c, ndigits = 2)
t
# Examples
1, 2, 3) t(
1.0
23, 24, 25) t(
22.12
4.3 Why we use Lambda Functions
I promised to show you why these are so useful, and I intend to make right on that here in this section. Much of this section is adapted from the excellent W3Schools tutorial on Lambda Functions. We often see the use of them when considering anonymous functions inside another UDF, as they can alter the function of UDFs (oh my this is horrible to write, thankfully we have UDF shorthand!) based on what input we give. This then gives us a template to create functions that perform a certain task, which we can then tweak ourselves!
In essence, we can set up a UDF using “def” whose input is then passed to a lambda function which is in the return statement and hence alters the output that that lambda function gives by using the input. A great example of this is that we can take a function that say, multiplies a number by some arbitrary n (the input to the overall UDF), to create a function that doubles all inputs by giving 2 as the value of n, a function that triples all inputs by giving 3 as the value of n and so on.
This sounds very confusing, but becomes more clear with an example.
4.3.1 Example
Here we will construct a UDF that carries within it a lambda function that multiplies its value supplied by an arbitrary n, which is the input to our UDF.
Therefore, whenever we change the value of n and assign it to something, we are creating a new function that we can then use! This is very powerful and popular amongst those building pipelines.
# Create a UDF that contains a lambda function to allow control over its function
def my_func(n):
return lambda a : a * n
# Notice that the lambda is part of the return, which is key here!
# Example 1
= my_func(2) # This is now a function that doubles the input!
my_doubler # See that it is a function!
print(type(my_doubler))
# Example 2
print(my_doubler(11))
= my_func(3) # This is now a function that triples the input!
my_tripler print(type(my_tripler))
print(my_tripler(22))
<class 'function'>
22
<class 'function'>
66
This is incredibly powerful as you can see, essentially allowing us to create templates with which to create functions from that perform particular tasks dependent on the input to the UDF.
5 Resources for Further Learning
This chapter is an introduction to UDFs and Lambda Functions and how they work, however there’s a whole lot more to be explored; including returning multiple values, optional arguments (a big one that is worth exploring!) and more.
Some additional information about functions can be found:
W3Schools Functions page is fantastic. In fact their entire repertoire of tutorials comes highly recommended for the way it breaks down concepts with examples. They even provide consoles for you to practice that open in separate webpages!
6 Functions for DataFrames
So far we’ve seen how functions work and how to create them, but how can we apply them to our DataFrames?
Say this is my cleaning routine for multiple dataframes: * Remove spaces from my column names and change to underscores * Change column names to lower case * Select just the columns of the “object” data Type.
If I have a few of them to run this on the process can mean a lot of lines of code used which becomes less readable as more is added. We can use functions to wrap up these processes into a cleaning routine function.
6.1 Example
Let’s make a copy of our animals data frame and see what the routine looks like applied to it. This is great revision of the content in Chapter 4 - Working with DataFrames, which you should revisit if you find the concepts harder to pick back up.
# Make a copy
= animals.copy() animals_new
Now the cleaning routine:
# replace column spaces with "_"
= animals_new.columns.str.replace(" ", "_")
animals_new.columns
# lower case the columns names
= animals_new.columns.str.lower()
animals_new.columns = animals_new.select_dtypes(["O"])
animals_new_objects animals_new_objects
incidentnumber | datetimeofcall | finyear | typeofincident | finaldescription | animalgroupparent | propertytype | specialservicetypecategory | specialservicetype | borough | stngroundname | animalclass | code | london | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 139091 | 01/01/2009 03:01 | 2008/09 | Special Service | DOG WITH JAW TRAPPED IN MAGAZINE RACK,B15 | Dog | House - single occupancy | Other animal assistance | Animal assistance involving livestock - Other ... | Croydon | Norbury | Mammal | 00AH | Outer London |
1 | 275091 | 01/01/2009 08:51 | 2008/09 | Special Service | ASSIST RSPCA WITH FOX TRAPPED,B15 | Fox | Railings | Other animal assistance | Animal assistance involving livestock - Other ... | Croydon | Woodside | Mammal | 00AH | Outer London |
2 | 2075091 | 04/01/2009 10:07 | 2008/09 | Special Service | DOG CAUGHT IN DRAIN,B15 | Dog | Pipe or drain | Animal rescue from below ground | Animal rescue from below ground - Domestic pet | Sutton | Wallington | Mammal | 00BF | Outer London |
3 | 2872091 | 05/01/2009 12:27 | 2008/09 | Special Service | HORSE TRAPPED IN LAKE,J17 | Horse | Intensive Farming Sheds (chickens, pigs etc) | Animal rescue from water | Animal rescue from water - Farm animal | Hillingdon | Ruislip | Mammal | 00AS | Outer London |
4 | 3553091 | 06/01/2009 15:23 | 2008/09 | Special Service | RABBIT TRAPPED UNDER SOFA,B15 | Rabbit | House - single occupancy | Other animal assistance | Animal assistance involving livestock - Other ... | Havering | Harold Hill | Mammal | 00AR | Outer London |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
5747 | 138718-29092018 | 29/09/2018 14:57 | 2018/19 | Special Service | ASSIST RSPCA WITH CAT STUCK ON WINDOW LEDGE 3 ... | Cat | Purpose Built Flats/Maisonettes - 4 to 9 storeys | Animal rescue from height | Animal rescue from height - Domestic pet | Hammersmith And Fulham | Chiswick | Mammal | 00AN | NaN |
5748 | 138738-29092018 | 29/09/2018 15:13 | 2018/19 | Special Service | CAT STUCK ON 2ND FLOOR OF PROPERTY ASSIST... | Cat | Animal harm outdoors | Animal rescue from height | Animal rescue from height - Domestic pet | Waltham Forest | Walthamstow | Mammal | 00BH | NaN |
5749 | 138800-29092018 | 29/09/2018 16:49 | 2018/19 | Special Service | CAT TRAPPED IN BUSHES ON LAKE | Cat | Lake/pond/reservoir | Animal rescue from water | Animal rescue from water - Domestic pet | Bromley | Bromley | Mammal | 00AF | NaN |
5750 | 138957-29092018 | 29/09/2018 21:10 | 2018/19 | Special Service | DOG WITH PAWS TRAPPED IN METAL TABLE LEG | Dog | House - single occupancy | Other animal assistance | Assist trapped domestic animal | Ealing | Southall | Mammal | 00AJ | NaN |
5751 | 139509-30092018 | 30/09/2018 21:39 | 2018/19 | Special Service | CAT STUCK ON ROOF OF HOUSE | Cat | Bungalow - single occupancy | Animal rescue from height | Animal rescue from height - Domestic pet | Barnet | Southgate | Mammal | 00AC | NaN |
5752 rows × 14 columns
Without a function for each DataFrame I wanted to do this on I’d have to go in and change the object names. Let’s make a function to generalise this process, allowing us to utilise it on any DataFrame of our choosing.
def clean_frame_give_objects(dataframe):
"""
Function that cleans the column names of the input frame, selecting only those columns with object data at the same time.
Parameters
----------
dataframe: A pandas dataframe
Returns
-------
dataframe_objects: The cleaned columns from dataframe with only object datatypes selected.
"""
= dataframe.columns.str.replace(" " , "_")
dataframe.columns = dataframe.columns.str.lower()
dataframe.columns = dataframe.select_dtypes(include = ["O"] )
dataframe_objects
return dataframe_objects
Our function does the same as our script did.
clean_frame_give_objects(animals)
incidentnumber | datetimeofcall | finyear | typeofincident | finaldescription | animalgroupparent | propertytype | specialservicetypecategory | specialservicetype | borough | stngroundname | animalclass | code | london | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 139091 | 01/01/2009 03:01 | 2008/09 | Special Service | DOG WITH JAW TRAPPED IN MAGAZINE RACK,B15 | Dog | House - single occupancy | Other animal assistance | Animal assistance involving livestock - Other ... | Croydon | Norbury | Mammal | 00AH | Outer London |
1 | 275091 | 01/01/2009 08:51 | 2008/09 | Special Service | ASSIST RSPCA WITH FOX TRAPPED,B15 | Fox | Railings | Other animal assistance | Animal assistance involving livestock - Other ... | Croydon | Woodside | Mammal | 00AH | Outer London |
2 | 2075091 | 04/01/2009 10:07 | 2008/09 | Special Service | DOG CAUGHT IN DRAIN,B15 | Dog | Pipe or drain | Animal rescue from below ground | Animal rescue from below ground - Domestic pet | Sutton | Wallington | Mammal | 00BF | Outer London |
3 | 2872091 | 05/01/2009 12:27 | 2008/09 | Special Service | HORSE TRAPPED IN LAKE,J17 | Horse | Intensive Farming Sheds (chickens, pigs etc) | Animal rescue from water | Animal rescue from water - Farm animal | Hillingdon | Ruislip | Mammal | 00AS | Outer London |
4 | 3553091 | 06/01/2009 15:23 | 2008/09 | Special Service | RABBIT TRAPPED UNDER SOFA,B15 | Rabbit | House - single occupancy | Other animal assistance | Animal assistance involving livestock - Other ... | Havering | Harold Hill | Mammal | 00AR | Outer London |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
5747 | 138718-29092018 | 29/09/2018 14:57 | 2018/19 | Special Service | ASSIST RSPCA WITH CAT STUCK ON WINDOW LEDGE 3 ... | Cat | Purpose Built Flats/Maisonettes - 4 to 9 storeys | Animal rescue from height | Animal rescue from height - Domestic pet | Hammersmith And Fulham | Chiswick | Mammal | 00AN | NaN |
5748 | 138738-29092018 | 29/09/2018 15:13 | 2018/19 | Special Service | CAT STUCK ON 2ND FLOOR OF PROPERTY ASSIST... | Cat | Animal harm outdoors | Animal rescue from height | Animal rescue from height - Domestic pet | Waltham Forest | Walthamstow | Mammal | 00BH | NaN |
5749 | 138800-29092018 | 29/09/2018 16:49 | 2018/19 | Special Service | CAT TRAPPED IN BUSHES ON LAKE | Cat | Lake/pond/reservoir | Animal rescue from water | Animal rescue from water - Domestic pet | Bromley | Bromley | Mammal | 00AF | NaN |
5750 | 138957-29092018 | 29/09/2018 21:10 | 2018/19 | Special Service | DOG WITH PAWS TRAPPED IN METAL TABLE LEG | Dog | House - single occupancy | Other animal assistance | Assist trapped domestic animal | Ealing | Southall | Mammal | 00AJ | NaN |
5751 | 139509-30092018 | 30/09/2018 21:39 | 2018/19 | Special Service | CAT STUCK ON ROOF OF HOUSE | Cat | Bungalow - single occupancy | Animal rescue from height | Animal rescue from height - Domestic pet | Barnet | Southgate | Mammal | 00AC | NaN |
5752 rows × 14 columns
We could also expand this loop to allow us to perform the function on multiple DataFrames. This would allow us to clean multiple DataFrames at once and store them in a “list” object for example.
6.1.1 Example - Looping through a list of DataFrames
# We can then loop through the list of frames and process them one after another
# Collect our list of frames
= [animals, titanic]
list_of_frames
# Create a new blank list to store our frames in
= []
new_frames
# Use a for loop to apply our function
for each_frame in list_of_frames:
= clean_frame_give_objects(each_frame)
clean_frame new_frames.append(clean_frame)
We can then sub-select each DataFrame from our list like so:
# selects the datafame at position 0 in our list
0].head() new_frames[
incidentnumber | datetimeofcall | finyear | typeofincident | finaldescription | animalgroupparent | propertytype | specialservicetypecategory | specialservicetype | borough | stngroundname | animalclass | code | london | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 139091 | 01/01/2009 03:01 | 2008/09 | Special Service | DOG WITH JAW TRAPPED IN MAGAZINE RACK,B15 | Dog | House - single occupancy | Other animal assistance | Animal assistance involving livestock - Other ... | Croydon | Norbury | Mammal | 00AH | Outer London |
1 | 275091 | 01/01/2009 08:51 | 2008/09 | Special Service | ASSIST RSPCA WITH FOX TRAPPED,B15 | Fox | Railings | Other animal assistance | Animal assistance involving livestock - Other ... | Croydon | Woodside | Mammal | 00AH | Outer London |
2 | 2075091 | 04/01/2009 10:07 | 2008/09 | Special Service | DOG CAUGHT IN DRAIN,B15 | Dog | Pipe or drain | Animal rescue from below ground | Animal rescue from below ground - Domestic pet | Sutton | Wallington | Mammal | 00BF | Outer London |
3 | 2872091 | 05/01/2009 12:27 | 2008/09 | Special Service | HORSE TRAPPED IN LAKE,J17 | Horse | Intensive Farming Sheds (chickens, pigs etc) | Animal rescue from water | Animal rescue from water - Farm animal | Hillingdon | Ruislip | Mammal | 00AS | Outer London |
4 | 3553091 | 06/01/2009 15:23 | 2008/09 | Special Service | RABBIT TRAPPED UNDER SOFA,B15 | Rabbit | House - single occupancy | Other animal assistance | Animal assistance involving livestock - Other ... | Havering | Harold Hill | Mammal | 00AR | Outer London |
# selects the datafame at position 1 in our list
1].head() new_frames[
name | sex | ticket | cabin | embarked | home.dest | boat | |
---|---|---|---|---|---|---|---|
0 | Allen, Miss. Elisabeth Walton | female | 24160 | B5 | S | St Louis, MO | 2 |
1 | Allison, Master. Hudson Trevor | male | 113781 | C22 C26 | S | Montreal, PQ / Chesterville, ON | 11 |
2 | Allison, Miss. Helen Loraine | female | 113781 | C22 C26 | S | Montreal, PQ / Chesterville, ON | NaN |
3 | Allison, Mr. Hudson Joshua Creighton | male | 113781 | C22 C26 | S | Montreal, PQ / Chesterville, ON | NaN |
4 | Allison, Mrs. Hudson J C (Bessie Waldo Daniels) | female | 113781 | C22 C26 | S | Montreal, PQ / Chesterville, ON | NaN |
6.2 Better Practice with regards to Functions
There are a few points to make about the concepts discussed here, namely that looping through a list of dataframes may not always be the most efficient way to accomplish something when the dfs themselves are particularly large and there is a large number of them. In fact, this would be considered bad practice in general.
Thankfully, there is a solution to this in the form of the map function (for those of you who use R, you will be reminded of the apply family of functions when thinking about map). The “map()” function allows us to apply a specified function of our choice to an iterable (so a list or tuple for example) to circumvent the need for looping (and as such is much faster), creating what is called a “map” object.
This may sound like it could be a problem, but thankfully, we have the “list()” function to covert this back to a list, so we must remember to convert the datatype as a final step. I personally think this is well worth the extra step when it is a much more efficient solution and has wider uses than loops in general. It is often said that a loop should be used as a last resort for this kind of application if such a function as “map()” is not applicable.
The syntax for map is as follows:
map(function, iterable)
Where the function is a UDF or a built in function and the iterable is the list, tuple etc of our choice!
6.2.1 Example
# Use the map function to circumvent the need for a loop!
= [animals.copy(), titanic.copy()]
frame_list
# Use the map function
= list(map(clean_frame_give_objects, frame_list)) new_frames
Let’s check!
0].head() new_frames[
incidentnumber | datetimeofcall | finyear | typeofincident | finaldescription | animalgroupparent | propertytype | specialservicetypecategory | specialservicetype | borough | stngroundname | animalclass | code | london | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 139091 | 01/01/2009 03:01 | 2008/09 | Special Service | DOG WITH JAW TRAPPED IN MAGAZINE RACK,B15 | Dog | House - single occupancy | Other animal assistance | Animal assistance involving livestock - Other ... | Croydon | Norbury | Mammal | 00AH | Outer London |
1 | 275091 | 01/01/2009 08:51 | 2008/09 | Special Service | ASSIST RSPCA WITH FOX TRAPPED,B15 | Fox | Railings | Other animal assistance | Animal assistance involving livestock - Other ... | Croydon | Woodside | Mammal | 00AH | Outer London |
2 | 2075091 | 04/01/2009 10:07 | 2008/09 | Special Service | DOG CAUGHT IN DRAIN,B15 | Dog | Pipe or drain | Animal rescue from below ground | Animal rescue from below ground - Domestic pet | Sutton | Wallington | Mammal | 00BF | Outer London |
3 | 2872091 | 05/01/2009 12:27 | 2008/09 | Special Service | HORSE TRAPPED IN LAKE,J17 | Horse | Intensive Farming Sheds (chickens, pigs etc) | Animal rescue from water | Animal rescue from water - Farm animal | Hillingdon | Ruislip | Mammal | 00AS | Outer London |
4 | 3553091 | 06/01/2009 15:23 | 2008/09 | Special Service | RABBIT TRAPPED UNDER SOFA,B15 | Rabbit | House - single occupancy | Other animal assistance | Animal assistance involving livestock - Other ... | Havering | Harold Hill | Mammal | 00AR | Outer London |
1].head() new_frames[
name | sex | ticket | cabin | embarked | home.dest | boat | |
---|---|---|---|---|---|---|---|
0 | Allen, Miss. Elisabeth Walton | female | 24160 | B5 | S | St Louis, MO | 2 |
1 | Allison, Master. Hudson Trevor | male | 113781 | C22 C26 | S | Montreal, PQ / Chesterville, ON | 11 |
2 | Allison, Miss. Helen Loraine | female | 113781 | C22 C26 | S | Montreal, PQ / Chesterville, ON | NaN |
3 | Allison, Mr. Hudson Joshua Creighton | male | 113781 | C22 C26 | S | Montreal, PQ / Chesterville, ON | NaN |
4 | Allison, Mrs. Hudson J C (Bessie Waldo Daniels) | female | 113781 | C22 C26 | S | Montreal, PQ / Chesterville, ON | NaN |
This opens you up to the world of functional programming, which is its own discipline within programming and could be a series of courses in itself! This is something that the programming language R was built for but is now possible in Python as well. For more information on the map function, consult this excellent article on the Python Map Function
One last point I wish to make is with regards to the cleaning function we wrote before, namely that it performs two tasks in one, namely:
- Cleaning column names
- Selecting object datatypes
This was done for convenience here and to show a good example of using functions for dataframes but is not good practice. The Best Practice in Programming and Modular Programming in Python courses on the Learning Hub would cry if they saw this, as we should always strive to make functions as simple as possible, performing only one task each unless impossible to separate them.
As such, we should create a smaller function to clean the column names, outputting a dataframe that can then be passed to an object selector function and so on in a pipeline. This is food for thought for now but I would advise taking these courses and learning more about the good practice of function writing should you need to do this in your role.
7 Chapter Summary
Fantastic function writing! This is not only the end of the chapter but the end of Core Part 2 and hence the Introduction to Python course as a whole! You should be so proud of what you have achieved throughout this course as there has been so much to learn from this wonderful programming language we all now know and (hopefully!) love.
As a reminder, the best follow up to this course is the Best Practice in Programming course, you can then follow that up with a multitude of courses. This can lead you into Visualization for example, Statistics in Python and for those that want the next step in your programmatic toolset, Modular Programming in Python!
See you in the next course!