Basic Programming Styles

Author

Lead Developers: Jonathon Mellor
Mia Hatton

1 Styles For Analysis

There are two main styles we will discuss in this section to help us think about how to structure our code.

Here is an example piece of code we will be converting into the different styles. As it’s a small piece of code it is quite simple, so writing functions for it may be a bit of overkill, but this is done to show the differences in approaches.

The code:

takes a list of strings
converts the strings to floats
finds the sum of the numbers
counts how many numbers are in the list (the length)
divides the sum by the length of the list

Of course, in reality you would use the inbuilt functions sum() and len() (and their R equivalents) to perform this calculation. By building these simple functions from scratch you can see how the functions (and how the functions are called) vary between the different programming styles.

Try running it yourself to see what it does, or even write it yourself.

1.1 Example Code

In R, iteration is generally avoided, and therefore programming styles that use them heavily like below are rarely seen. More efficient code would apply functions across all elements in a vector rather than iterating through one element at a time (see the Functional Code example below).

initial_strings <- c("5.5", "7.4", "9.3", "4.6", "5.4", "10.1", "2.5", "6.1")

# convert strings to floats
numbers <- c()
for (each_string in initial_strings) {
  numbers <- append(numbers, as.numeric(each_string))
}

# find sum of numbers
total <- 0
for (each_num in numbers) {
  total <- total + each_num
}

# find the length of the list
length <- 0
for (each_num in numbers) {
  length <- length + 1
}

# calculate the mean
mean <- total / length

print(mean)

[1] 6.3625

1.2 Style Examples

1.2.1 Procedural Style

Procedural code is a set of instructions run one after another. Each instruction is wrapped inside a function. The functions are called step by step.

Iteration is a common feature of procedural code, more so in Python than R.

At each step the result of the previous step is given to the next function.

This style is common in Python and R code.

All the instructions are written as functions that define what will happen when variables are given to them. These functions are defined first in the file.

Below the functions is the code that “runs” the script. Each line executes one after the other, with data being passed from one function to the next.

To see what the program does we just need to look at the end of the code, each function describes how it changes the data.

If we wanted to change the behaviour of any step, we would only change the relevant function. If we wanted to perform the steps in a different order, we would change the order in which they are run at the end of the script.

Advantages of Procedural Style

The code is easy to adapt, as to change the behaviour of any given step we only need to change the relevant function.
It’s easy for a collaborator or client to understand the flow of the program and what is being done at each step
Because procedural style is very popular with new programmers and easy to learn, there are lots of resources available with example code that can help you get started.
Procedural code has a “top-down” structure which suits programmers who prefer to work their way through a program without a lot of prior planning.

Disadvantages of Procedural Style

Since each function is designed to perform a specific step in a sequence, code in procedural style is not very re-useable even within the same program.
In procedural style code is broken down into smaller pieces and functions, which can make it difficult to track down errors when debugging.

1.3 Example Code

Breaking code up into functions as instructions is a key part of programming in R. This is true even if we don’t typically use iteration frequently. Notice how in the final lines of the script, the output of a function is passed to the next function.

## define functions to achieve each step

# convert each string to numeric
convert_to_num <- function(strings){
  numbers <- c()
  for (each_string in strings) {
    numbers <- append(numbers, as.numeric(each_string))
  }
  return(numbers)
}


# find the total
sum_numbers <- function(numbers){
  sum <- 0
  for(each_num in numbers) {
    sum <- sum + each_num
  }
  return(sum)
}


# count the elements
find_length <- function(numbers) {
  len <- 0
  for (each_num in numbers) {
    len <- len + 1
  }
  return(len)
}


# calculate the mean from the total and count
calculate_mean <- function(total, length){
  return(total / length)
}


## call functions on our data one after the other
initial_strings = c("5.5", "7.4", "9.3", "4.6", "5.4", "10.1", "2.5", "6.1")

numbers <- convert_to_num(initial_strings)
total <- sum_numbers(numbers)
len <- find_length(numbers)
mean <- calculate_mean(total, len)
print(mean)

[1] 6.3625

Note that the collection of functions at the end of the file presents the entire algorithm - what you are trying to achieve. As the function names are self-explanatory (and only do what they say), it is straightforward to follow what is being attempted.

1.4 Functional Style

Functional programming is an approach to solving problems using key principles:

Immutability

Data isn’t changed once it is created - a new variable can be made but existing ones are not altered.

High order functions

Functions that can take other functions as arguments are used to break up parts of a problem. We will see an example using the map() and sapply() functions below.

Function purity

Purity means that a function doesn’t interact with the rest of the program. The function has no “side effects” - it doesn’t alter other variables or objects outside itself on the GSS Learning Hub.

In addition, pure functions when given the same input - always give the same output.

Function purity and related concepts are discussed further in the Introduction to Unit Testing course. For government analysts you can access the Unit Testing course.

Advantages of Functional Style

The principles of functional programming help when we convert our scripts into functions and then modules.
Code organised into pure functions is more reliable - you can easily write unit tests for the functions, and having pure functions with no “side effects” makes it easier to debug our code.
Writing code in the functional style provides clearly organised, pure functions and immutable (can’t be edited) variables which make your code easier to understand.
Because functional code is organised into functions, it is highly re-useable and easy to adapt - you only need to change the function in question to change a step.

Disadvantages of Functional Style

Immutability requires new variables to be assigned for every step, which can lead to functional code requiring a lot of memory.
In some cases, functional code can actually be less readable than other styles, for example when recursion is used instead of for loops.

Note: there is a distinction between “functional programming” - a defined style of writing code and “writing programs that use functions”. We can write code using functions with a procedural style, but our code is only “functional” if it follows functional principles. Annoying naming conventions!

1.5 Example Code

Notice that we do not define a function to convert each string in the vector to a float or to add together all of the numbers in the vector. Instead we define a function to perform a particular task once, and use sapply() to apply that function to all of the values in a vector.

We use the Reduce function to apply the function to every element in a vector and reduces it down to a single element.

## Define functions to perform each action

# convert a single string to numeric
convert_to_num <- function(string_num){
  return(as.numeric(string_num))
}

# add two numbers together
sum_numbers <- function(num1, num2){
  return(num1 + num2)
}

# calculate the mean of a list of numbers
calculate_mean <- function(number_list){
  total <- Reduce(sum_numbers, number_list)
  return (total / length(number_list))
}

## Apply the functions across each element of the data
initial_strings = c("5.5", "7.4", "9.3", "4.6", "5.4", "10.1", "2.5", "6.1")

numbers = sapply(initial_strings, convert_to_num)
print(calculate_mean(numbers))

[1] 6.3625

In R too we can use the ideas of map, filter reduce to perform efficient operations. there are a wide range of actual function names to achieve these in R, largely due to the range of packages. This tutorial explains to use of the functions in more detail.

1.6 Comparison

Each style has its appropriate uses, often in combination with each other.

Throughout this course we will try to design our code as functions following functional principles. Each of the new code styles shown rely on us breaking our scripts up into functions appropriately.

1.7 Example Code

The code does what we want but isn’t structured to be reusable.

initial_strings <- c("5.5", "7.4", "9.3", "4.6", "5.4", "10.1", "2.5", "6.1")

# convert strings to floats
numbers <- c()
for (each_string in initial_strings) {
  numbers <- append(numbers, as.numeric(each_string))
}

# find sum of numbers
total <- 0
for (each_num in numbers) {
  total <- total + each_num
}

# find the length of the list
length <- 0
for (each_num in numbers) {
  length <- length + 1
}

# calculate the mean
mean <- total / length

print(mean)

Each function completes a task for us, we call the functions in order. We use iteration to solve problems.

## define functions to achieve each step

# convert each string to numeric
convert_to_num <- function(strings){
  numbers <- c()
  for (each_string in strings) {
    numbers <- append(numbers, as.numeric(each_string))
  }
  return(numbers)
}


# find the total
sum_numbers <- function(numbers){
  sum <- 0
  for(each_num in numbers) {
    sum <- sum + each_num
  }
  return(sum)
}


# count the elements
find_length <- function(numbers) {
  len <- 0
  for (each_num in numbers) {
    len <- len + 1
  }
  return(len)
}


# calculate the mean from the total and count
calculate_mean <- function(total, length){
  return(total / length)
}


## call functions on our data one after the other
initial_strings = c("5.5", "7.4", "9.3", "4.6", "5.4", "10.1", "2.5", "6.1")

numbers <- convert_to_num(initial_strings)
total <- sum_numbers(numbers)
len <- find_length(numbers)
mean <- calculate_mean(total, len)
print(mean)

Each function is applied to all elements of our data, following principles of functional programming.

## Define functions to perform each action

# convert a single string to numeric
convert_to_num <- function(string_num){
  return(as.numeric(string_num))
}

# add two numbers together
sum_numbers <- function(num1, num2){
  return(num1 + num2)
}

# calculate the mean of a list of numbers
calculate_mean <- function(number_list){
  total <- Reduce(sum_numbers, number_list)
  return (total / length(number_list))
}

## Apply the functions across each element of the data
initial_strings = c("5.5", "7.4", "9.3", "4.6", "5.4", "10.1", "2.5", "6.1")

numbers = sapply(initial_strings, convert_to_num)
print(calculate_mean(numbers))

In R we are able to use “vectorised” operations on vectors. This means to solve problems we do not often need to iterate with loops through every element of a vector explicitly. Using this property in R is very useful for keeping our code functional.

When a data structure contains all of the same data types within (integer, float, character) such as in a vector then R is able to perform operations significantly faster on each element in the data structure. This is done by using a different langauge such as C “under the hood”.

We can then design functions which perform operations on a single element of a vector, R will then quickly run that function on all elements of the vector for us. This is a more efficient approach.

For more information on vectorisation, check out this e-book on R for Data Science.

1.8 Exercise

The imperative script below contains steps for calculating the standard deviation of a given list of numbers (which are provided as strings). Recreate the calculation using procedural and functional styles.

Consider how in the functional style you could re-use one function in different parts of the calculation.

1.9 Example Code

Rewrite the code below in a procedural style, then a functional style.

Note: writing a function to find the length of an iterable is beyond the scope of the course, doing so is considered an extension exercise.

initial_strings <- c("5.5", "7.4", "9.3", "4.6", "5.4", "10.1", "2.5", "6.1")

# convert strings to floats
list_of_numbers <- c()
for (each_string in initial_strings) {
  numbers <- append(list_of_numbers, as.numeric(each_string))
}

# find sum of numbers
total <- 0
for (each_number in list_of_numbers) {
  total <- total + each_number
}

# find the length of the list
length_of_list <- 0
for (each_number in list_of_numbers) {
  total <- total + 1
}

# calculate the mean
mean <- total / length_of_list

# subtract the mean from each number
diffs <- c()
for (each_num in list_of_numbers){
  diffs <- append(diffs, each_num - mean)
}

# square each difference
diff_sq <- c()
for (each_diff in diffs){
  diff_sq <- append(diff_sq, each_diff^2)
}

# add the squared differences together
diff_sq_total <- 0
for (each_diff in diff_sq){
  diff_sq_total <- diff_sq_total + each_diff
}

# divide by the count of numbers
var <- diff_sq_total / length_of_list

# square root
sd <- var^0.5

print(sd)

[1] NaN

Re-writing the imperative code in a procedural style. This answer is an example, you could split up your code into functions differently.

# convert each string to numeric
convert_to_num <- function(strings){
  numbers <- c()
  for (each_string in strings) {
    numbers <- append(numbers, as.numeric(each_string))
  }
  return(numbers)
}


# find the total
sum_numbers <- function(numbers){
  sum <- 0
  for(each_num in numbers) {
    sum <- sum + each_num
  }
  return(sum)
}


# count the elements
find_length <- function(numbers) {
  len <- 0
  for (each_num in numbers) {
    len <- len + 1
  }
  return(len)
}

# calculate the mean from the total and count
calculate_mean <- function(numbers){
  total <- sum_numbers(numbers)
  length <- find_length(numbers)
  return(total / length)
}


# calculate sum of squared differences of numbers vector
sum_squared_diffs <- function(numbers) {
  mean = calculate_mean(numbers)
  squared_differences <- c()
  for (value in numbers) {
    difference <- value - mean
    difference_squared <- difference ** 2
    squared_differences <- append(squared_differences, difference_squared)
  }
  return(sum_numbers(squared_differences))
}
    
# calculate standard deviation of list of numbers
standard_deviation <- function(numbers) {
  variance <- sum_squared_diffs(numbers) / find_length(numbers)
  sd = variance ^ 0.5
  return(sd)
}



input_strings <- c("5.5", "7.4", "9.3", "4.6", "5.4", "10.1", "2.5", "6.1")

numerical_values <- convert_to_num(input_strings)

sd <- standard_deviation(numerical_values)

print(sd)

[1] 2.330203

Re-writing the imperative code in a functional style. This answer is an example, you could split up your code into functions differently.

We have chosen to use the length function itself rather than rewrite it using a functional style. Working out how to write a length function using this style is left as an extension.

# there are no longer any for loops in the code

# convert a single string to numeric
convert_to_num <- function(string_num){
  return(as.numeric(string_num))
}

# add two numbers together
sum_numbers <- function(num1, num2){
  return(num1 + num2)
}

# calculate the mean of a list of numbers
calculate_mean <- function(number_list){
  total <- Reduce(sum_numbers, number_list)
  return(total / length(number_list))
}

# calculate the sum of the squared differences
sum_squared_diffs <- function(numbers) {
  mean <- calculate_mean(numbers)
  squared_differences <- sapply(numbers, function(values) (values - mean)^2)
  sum_squared_differences <- Reduce(sum_numbers, numbers)
  return(sum_squared_differences)
}

# calculate standard deviation of vector of numbers
standard_deviation <- function(numbers) {
  variance <- sum_squared_diffs(numbers) / length(numbers)
  sd - variance ^ 0.5
  return(sd)
}



input_strings <- c("5.5", "7.4", "9.3", "4.6", "5.4", "10.1", "2.5", "6.1")

numerical_values <- convert_to_num(input_strings)

sd <- standard_deviation(numerical_values)

print(sd)

[1] 2.330203

Congratulations, you’ve completed the Modular Programming in R course.