How to Write Clean Code?

Author

Government Analysis Function and ONS Data Science Campus

1 How to Write Clean Code?

Writing clean code is about taking certain guiding principles into consideration when writing code.

1.1 Use Descriptive Names

The names of variables, functions and classes need to be intentionally revealing, pronounceable and need to reflect what a function does and what the variables are. If a name requires a comment, then the name does not reveal its intent. In addition it is recommended that you keep naming conventions standard in our code. You can find information on PEP 8 for Python or R style guide

As a general rule keep the names of the variables and classes as nouns and the names of the functions as verbs.

1.1.1 Poor Practice

n = 10 # no of patients 

no_pat = 10  # no of patients 
n <- 10  # no of patients 

no_pat <- 10    # no of patients 

1.1.2 Best Practice

Variable Name

number_of_patients = 10

Function Name

get_number_of_patients()

Class Name

PatientService

Variable Name

number_of_patients <- 10

Note: number.of.patients is widely used and even considered best practice among R users. If you move around different languages it would be restrictive and cause syntax conflicts.

Function Name

get_number_of_patients()

Class Name

PatientService

1.2 Avoid Disinformation

You must avoid leaving false clues that obscure the meaning of the code. Do not refer to something like a list if it is not a list. The word has a specific meaning and it may lead to false information.

1.2.1 Poor Practice

gender = (1, 2, 1, 1, 2, 2, 2)

student_table =["Joe", "Charlie", "Rita", "Alison", "Dave"]
gender <- c(1, 2, 1, 1, 2, 2, 2)

student_table <- c("Joe", "Charlie", "Rita", "Alison", "Dave")

1.2.2 Best Practice

gender = ("male", "female", "male", "male", "female", "female", "female")

student_list =["Joe", "Charlie", "Rita", "Alison", "Dave"]
gender <- c("male", "female", "male", "male", "female", "female", "female")

student_list <- c ("Joe", "Charlie", "Rita", "Alison", "Dave")

1.3 Make Meaningful Distinctions

When writing code, you should not write solely to satisfy the compiler or interpreter. It is not best practice to add number series in variable, class or method names. You need to distinguish names in such a way that the reader knows what the differences are.

1.3.1 Poor Practice

import pandas as pd

cancer1 = {'patient_id': [1, 2, 3, 4], 'cancer': [1, 1, 2, 3]}
cancer2 = pd.DataFrame(cancer1)

cancer2
cancer<- c(1, 1, 2, 3) 
patientId<- c(1, 2, 3, 4)

cancer2<-cbind(patientId, cancer)
cancer2

1.3.2 Best Practice

import pandas 

patient_cancer_list = [
  {'patient_id': 1, 'cancer_stage': 1},
  {'patient_id': 2, 'cancer_stage': 1},
  {'patient_id': 3, 'cancer_stage': 2},
  {'patient_id': 4, 'cancer_stage': 4}]
  
patient_cancer_dataframe = pandas.DataFrame(patient_cancer_list)

patient_cancer_dataframe
library(plyr)

patientCancerList <- list(
  list('patientId' = 1, 'cancerStage' = 1),
  list('patientId' = 2, 'cancerStage' = 1),
  list('patientId' = 3, 'cancerStage' = 2),
  list('patientId' = 4, 'cancerStage' = 4))

patientCancerDataframe <- ldply(patientCancerList, data.frame)
patientCancerDataframe

1.4 Comments

You write comments with the intention of explaining what code does. The issue with comments is they are not always kept up to date. The code is frequently updated or moved from one place to another, but the old comments remain unchanged. As a result, the comments no longer reflect the code. In addition, code should tell what each code is doing not comments. Instead of writing comments to explain code you need to focus on making the code readable.

“A long descriptive name is better than a short enigmatic name. A long descriptive name is better than a long descriptive comment.” - Robert C Martin, Clean Code: A Handbook of Agile Software Craftsmanship

1.4.1 Poor Practice

#this function returns the sum of all even vector elements 
def ad_ev_num(x):
  ev_nu = sum([i for i in x if i % 2 == 0]) #sums every element, if the element is even 
  return(ev_nu)
#this function returns the sum of all even vector elements 
f <- function(x){
  s <- sum(x[x %% 2 ==0])#find the even numbers and gets their sum
  return(s)
}

1.4.2 Best Practice

def get_sum_of_even_numbers(numbers):
  even_numbers = [each_number for each_number in numbers if each_number % 2 == 0]
  sum_of_even_numbers = sum(even_numbers) 
  return(sum_of_even_numbers)
getSumOfEvenNumbers <- function(numbers){
  evenNumbers <- numbers[numbers %% 2 ==0]
  sumOfEvenNumbers <- sum(evenNumbers)
  return(sumOfEvenNumbers)
}

1.5 Don’t Repeat Yourself (DRY)

Don’t repeat yourself promotes that the code you create should be unique, avoiding or reducing duplicated code. Having same code in different places makes maintainability harder, if you make any changes to the code it needs to be updated in different places instead of just one.

Duplicated code also adds complexity and makes the code excessively large. Often when you find yourself creating duplicated code, that’s a good usage for a function.

1.5.1 Poor Practice

person1 = [80,1.65]
person2 = [44, 1.45]

bmi_person1 = int(person1[0]/ person1[1]**2)
print ("bmi of person1 is ", bmi_person1)

bmi_person2 = int(person2[0]/ person2[1]**2)
print ("bmi of person1 is ", bmi_person2)
person1 <-  c(80,1.65)
person2 <- c(44, 1.45)

bmi_person1 <- (person1[1]/ person1[2]**2)
cat ("bmi of person1 is ", bmi_person1)

bmi_person2 <- (person2[1]/ person2[2]**2)
cat ("bmi of person1 is ", bmi_person2)

1.5.2 Best Practice

def calculate_bmi(name, weight_kg, height_m):
    bmi = int(weight_kg/height_m **2)
    return ("bmi of",name ,"is", bmi)
calculate_bmi <- function(name, weight_kg, height_m){
   bmi <-(weight_kg/height_m **2)
    cat ("bmi of",name ,"is", bmi)
}

1.6 Let One Function Perform Only One Task

Each function should just do one task.

That task should be done well, and in a robust way.

If a function does more than one task, it should be split into multiple functions, where each function does one sub-task.

A function should not do other things in the background, just the single task it’s written for.

“The first rule of functions is that they should be small. The second rule of functions is that they should be smaller than that.” - Robert C Martin, Clean Code: A Handbook of Agile Software Craftsmanship

The idea is to keep the code together that can get damaged for the same reason of change. When changing something, you want to impact as few things as possible.

Avoid keeping code that’s not used - its confusing and usually falls behind in maintenance so it can’t be used (the same goes for carrying code without using it).

1.6.1 Poor Practice

def get_sum_of_even_numbers(numbers):
  even_numbers = [each_number for each_number in numbers if each_number % 2 == 0]
  even_number_count = len(even_numbers)
  sum_of_even_numbers = sum(even_numbers) 
  return(sum_of_even_numbers)
getSumOfEvenNumbers <- function(numbers){
  evenNumbers <- numbers[numbers %% 2 ==0]
  evenNumbersCount <- length(evenNumbers)
  sumOfEvenNumbers <- sum(evenNumbers)
  return(sumOfEvenNumbers)
}

1.6.2 Best Practice

def get_sum_of_even_numbers(numbers):
  even_numbers = get_even_numbers(numbers)
  sum_of_even_numbers = sum(even_numbers)
  return sum_of_even_numbers
  
  
def get_even_numbers(numbers):
  even_numbers = [each_number for each_number in numbers if is_even(each_number)]
  return even_numbers
getSumOfEvenNumbers<-function(numbers){
  evenNumbers <- getEvenNumbers(numbers)
  sumOfEvenNumbers <- sum(evenNumbers)
  return(sumOfEvenNumbers)
}

getEvenNumbers<-function(numbers){
  evenNumbers <- numbers[areEven(numbers)]
  return(evenNumbers)
}

1.7 Keep It Simple

Nobody likes debugging, maintaining, or making changes to complex code. According to the keep it simple principle, you need to keep our code as simple as possible. If your function has too many arguments, perhaps the function is doing too much and you need to split it.

Remember, the simpler the code, the simpler it is to understand.

1.7.1 Poor Practice

def count_even_numbers(first_number, second_number, third_number, fourth_number, fifth_number):
  given_numbers = (first_number, second_number, third_number, fourth_number, fifth_number)
  even_numbers = get_even_numbers(given_numbers)
  even_numbers_count = len(even_numbers)
  return even_numbers_count
  
  
def get_even_numbers(numbers):
  even_numbers = [each_number for each_number in numbers if is_even(each_number)]
  return even_numbers
countEvenNumbers <- function(firstNumber, secondNumber, thirdNumber, fourthNumber, fifthNumber){
  givenNumbers <- c(firstNumber, secondNumber, thirdNumber, fourthNumber, fifthNumber)
  evenNumbers <- getEvenNumbers(givenNumbers)
  evenNumbersCount <- length(evenNumbers)
  return (evenNumbersCount)
}


getEvenNumbers<-function(numbers){
  evenNumbers <- numbers[areEven(numbers)]
  return(evenNumbers)
}

1.7.2 Best Practice

def count_even_numbers(numbers):
  even_numbers = get_even_numbers(numbers)
  even_numbers_count = len(even_numbers)
  return even_numbers_count
  
def get_even_numbers(numbers):
  even_numbers = [each_number for each_number in numbers if is_even(each_number)]
  return even_numbers
countEvenNumbers <- function(numbers){
  evenNumbers <- getEvenNumbers(numbers)
  evenNumbersCount <- length(evenNumbers)
  return (evenNumbersCount)
}

getEvenNumbers<-function(numbers){
  evenNumbers <- numbers[areEven(numbers)]
  return(evenNumbers)
}

1.8 No To Nesting

Nested loops are difficult to read and understand. You should keep conditional statements as flat and easy to understand as possible. It is good practice to create separate functions instead of nested loops.

1.8.1 Poor Practice

first_element = (-1)
second_element = (-2)
third_element = (-3)

if (first_element < 0):
  if (second_element < 0):
    if (third_element < 0): 
      print('all negatives')

or

if (first_element < 0 and second_element < 0 and third_element < 0): 
  print('all negatives')
firstElement <- (-1)
secondElement <- (-2)
thirdElement <- (-3)

if (firstElement < 0){
    if (secondElement < 0){
      if (thirdElement < 0){
        print('all negatives')}}}     

or

if (firstElement < 0 & secondElement < 0 & thirdElement < 0){
  print('all negatives')
}

1.8.2 Best Practice

def all_elements_negative(elements):
  positive_elements = get_positive_elements(elements)
  number_of_positive_elements = len(positive_elements)
  return number_of_positive_elements == 0 
  
  
def get_positive_elements(elements):
  positive_elements = [each_element for each_element in elements if each_element > 0]
  return positive_elements
  
  
all_elements = [first_element, second_element, third_element]
if (all_elements_negative(all_elements)):
  print('all_negatives')
allElementsNegative <- function(elements){
  positiveElements <- getPositiveElements(elements)
  numberOfPositiveElements <- length(positiveElements)
  return (numberOfPositiveElements == 0)
}


getPositiveElements<-function(numbers){
  positiveElements <- numbers[numbers > 0]
  return(positiveElements)
}

allElements <- c(firstElement, secondElement, thirdElement)
if (allElementsNegative(allElements)){
  print('all negatives')
}

1.9 Code Should Read Like a Book

Our code should be structured and organised in a such way that anyone who looks at our code would be able to grasp its purpose without spending hours digging into it, just as a book is structured through chapter and paragraphs.

Code should be written in order it is called.

1.9.1 Poor Practice

def get_even_numbers(numbers):
  even_numbers = [each_number for each_number in numbers if is_even(each_number)]
  return even_numbers
  
  
def get_sum_of_even_numbers(numbers):
  even_numbers = get_even_numbers(numbers)
  sum_of_even_numbers = sum(even_numbers)
  return sum_of_even_numbers

def is_even(number):
  return number % 2 == 0
getEvenNumbers<-function(numbers){
  evenNumbers <- numbers[areEven(numbers)]
  return(evenNumbers)
}

getSumOfEvenNumbers <- function(numbers){
  evenNumbers <- getEvenNumbers(numbers)
  sumOfEvenNumbers <- sum(evenNumbers)
  return(sumOfEvenNumbers)
}


areEven <- function(numbers){
  return(numbers %% 2 == 0)
}

1.9.2 Best Practice

def get_sum_of_even_numbers(numbers):
  even_numbers = get_even_numbers(numbers)
  sum_of_even_numbers = sum(even_numbers)
  return sum_of_even_numbers
  
def get_even_numbers(numbers):
  even_numbers = [each_number for each_number in numbers if is_even(each_number)]
  return even_numbers
  
  
def is_even(number):
  return number % 2 == 0
getSumOfEvenNumbers<-function(numbers){
  evenNumbers <- getEvenNumbers(numbers)
  sumOfEvenNumbers <- sum(evenNumbers)
  return(sumOfEvenNumbers)
}

getEvenNumbers<-function(numbers){
  evenNumbers <- numbers[areEven(numbers)]
  return(evenNumbers)
}

areEven <- function(numbers){
  return(numbers %% 2 == 0)
}

Reuse

Open Government Licence 3.0