Python and command-line basics

Background to Python

Python was created by Guido van Rossum and first released in 1991. It was named this way as a reference to Monty Python’s Flying Circus.

It is a general-purpose programming language that has four key aims:

  • An easy and intuitive programming language that’s still powerful in performance.
  • Open source (so anyone can contribute to the development).
  • Understandable in plain English (this is often referred to as ‘readable’).
  • Suitable for everyday tasks, which allows for short development times when compared with other languages.

Python is extensible – we can use a broad range of additional packages to enhance our code and make processes easier. These are also (mostly) open source, and often have great support from the developers e.g. pandas, plotly.

In practical terms, Python is a programming language that you use to write instructions for a computer. These instructions are typically organized in the following ways:

Scripts

A script is a file containing Python code that is meant to be run directly. Scripts are often used for automation, data analysis, or any task you want to execute from start to finish. Any file ending with .py is a Python file. These files can contain scripts, reusable functions, classes, or any Python code. You can run them directly or import them into other Python files.

Modules

A module is simply a Python file (or a collection of files) that can be imported into other Python code. Modules help organize code into reusable pieces.

Package

A package is a collection of Python modules.

Variables, DataTypes and Data Structures

Variables

When we create objects in Python we assign it to a variable. Variables are words and numbers that act like labels; a reference to an object that lives in memory. Without this variable we can’t “find” the object again in memory and won’t be able to use it for analysis purposes.

In Python we assign variables using the equals sign (=), where our label (or variable name) goes on the left and the object we want to store goes on the right. Unlike other languages, in Python we do not have to state the data type of the variable we are storing in memory.

x = 4 + 3
x
7
x = 4 + 3
y = "Hello"
print(x,y)
7 Hello

Naming Conventions

Naming your variables can be one of the trickier parts of coding. Choosing sensible names saves time and energy later, when you try and remember what you’ve called something or if you need to refer to an object many times in code (after all, it can become tiring to consistently type out long variable names!).

Clever naming allows you to figure out what an object contains without having to inspect it first, a practice heavily adopted in code production and development.

Generally, a variable name:

  • Must start with a letter or an underscore.
  • Can’t begin with a number.
  • Only contains alphanumeric characters and underscores.
  • Is case sensitive (MY_VARIABLE and my_variable and My_vArIaBle are treated independently in Python).
  • Must not have hyphens as these are treated as negative signs in Python.
TipExercise
  1. Assign your name to a variable called name.
  2. Assign your age to a variable called age.
  3. Assign your favorite color to a variable

Data Types

Numerical

We’ll deal with two main types of numeric data types in Python.

  • int (plain integers) are positive or negative (including zero) whole numbers.
  • float (floating point numbers) are decimal numbers.

The handy type() function in Python allows us to check the type of whatever we put within the brackets.

type(4)
int
type(4.5674)
float
type(y)
str

Strings

Strings are sequences of character (word/text) data. The type in Python is called str. They are contained within either ‘single’ or “double” quotation marks and within your coding you should remain consistent with whichever you use.

We recommend that if you’re creating strings that use apostrophes or single quote marks within them, use double quotes to open and close your string.

name = "Alice"

Boolean

Boolean values are sometimes called logical values in other languages and consist of two unique values, True and False.

In Python they must be spelt out fully and have a capital first letter. They are not text values; so do not require quote marks. They are a reserved word (special words in Python that cannot be used as variable names), and so are displayed in bold green text. We will see many more examples of these reserved keywords later.

As we’ll see later Python often evaluates expressions in a Boolean context; something is either True or False. They even have parallels in the integers, namely that true has the value of 1 and false has the value of 0.

print(type(True), type(False))
<class 'bool'> <class 'bool'>
True + False 
1
4>5
False
3==3
True
TipExercise

What is the data type of the following?

"10"

10

True

"ten"

"True"

False

"false"

Data Structures

In this section we are going to explore some of the common data structures in Python. So far we have only stored one piece of information in memory; whereas we will usually want to store many. These data structures provide us with a particular way of organising data so it can be accessed efficiently. How you store it will often depend on how you want to use it later, so the choice is often an important one.

Lists

A list is a type of container; it holds a collection of items. The items in a list have an order (known as an index). They:

  • Are the most versatile of the built-in data structures
  • Can hold any sequence of objects
  • Can hold mixed objects (like strings, integers and Booleans together)
  • Are mutable (can be changed, we can add objects, delete them etc)

We create lists in Python using square brackets [ ] and separate each item with a comma.

fruits = ["apple", "banana", "cherry"]
fruits
['apple', 'banana', 'cherry']
ages = [25, 30, 35]
ages
[25, 30, 35]
TipExercise

Store at least three hobbies within a list. Choose an appropriate variable name.

Tuples

Tuples are similar to lists but have two major differences that separate them into their own object category with its own niche uses. These are:

  • Tuples are immutable (unchangeable)
  • Tuples are created with round brackets ( ) We’ll often see lists and tuples as inputs or outputs for the programming we do. If you see values in round brackets as the outputs from a process, this is indeed a tuple.
days_of_the_week = ("Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday")
days_of_the_week
('Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday')
location = (9.05785, 7.49508)
location   
(9.05785, 7.49508)

Dictionaries

The third object type we’ll look at is Dictionaries, which also store a collection of objects similarly to lists and tuples. They are:

  • Unordered (no index, unlike with tuples and lists)
  • Mutable (unlike tuples)
  • Can contain lists and other (nested) dictionaries

To create a dictionary in Python, we use the only type of brackets we have yet to utilise, the curly braces that often feature in mathematics { }. Dictionaries contain key value pairs (these are different from tuples!), where Keys are usually integers or strings (an immutable data type) and Values can be any type of object. Syntax wise, these pairs are separated by a colon, written as “Key:Value”.

gdp = {
    "Nigeria": 2229,
    "Wales": 2183,
    "Malawi": 1801
}

print(gdp)
{'Nigeria': 2229, 'Wales': 2183, 'Malawi': 1801}
student = {
    "name": "Bob",
    "grades": {"math": 90, "science": 85},
    "John": ["123-4567", "john@email.com"]
}

print(student)
{'name': 'Bob', 'grades': {'math': 90, 'science': 85}, 'John': ['123-4567', 'john@email.com']}
TipExercise

Create a dictionary called student that stores the following information:

  • Name of the student
  • Age of the student
  • Grade of the student

Pandas Objects

import pandas as pd

Series

Pandas gives us two new object types, one of which is universally popular.

“Series” are:

  • One dimensional arrays
  • Act like columns in a spreadsheet
  • Must have items of the same type (int, float or str)
  • Has a series index, which defaults to start at 0
  • Series have a large number of special methods and procedures associated with them; which we’ll explore in this course.

In the following example we will create a series from scratch, but this is not something we often do in practice

numbers = pd.Series([10, 20, 30, 40])
print(numbers)
0    10
1    20
2    30
3    40
dtype: int64

DataFrames

DataFrames are

  • A two dimensional version of the series object
  • Like a whole spreadsheet with both rows and columns.

Essentially a collection of series objects (one series per column) where

  • A column can only have one data type
  • Each subsequent column can have a different data type

The dimensions are labelled similarly to a series object index refers to the row labels, defaults to starting at 0 columns refers to the column labels, or headers. The “DataFrames” will have some of the same methods as “Series” and some different, with the major methods they both share being heavily utilized in Data Analysis, with no series specific method worth mentioning over dataframes themselves.

names = pd.Series(["Alice", "Bob", "Charlie"])
scores = pd.Series([85, 92, 78])

df = pd.DataFrame({
    "Name": names,
    "Score": scores
    })

print(df)
      Name  Score
0    Alice     85
1      Bob     92
2  Charlie     78

Reading in and Exporting Data

If I want to give you a location of a file, I can use the absolute file path. Let’s say, for example, that I have saved the “Intro_to_Python” folder in my C Drive and I want to access the file “animals.csv”.

The full or absolute location of this file is:

  • “C:/Users/username/Intro_to_Python/Data/animals.csv”

This is clear and explicit about where the data is stored. However, if you were to use this link you would need to change elements of it, for example your username is not “username”.

Because my working directory is automatically set to “C:/Users/ianbanda/Intro_to_Python/notebooks”, I can use what’s called a relative path.

A relative path is the location relative to the working directory, i.e., we specify the filepath starting from where we currently are in the folder structure.

For example, I can load the same file as above using the path

  • “../data/animals.csv”

This will work for any user, as long as their working directory is set to the “notebooks” Folder in the “Into_to_Python” parent folder. You may notice here I’ve used two full stops, which is something we have yet to see. This refers to us moving back one level in the folder structure.

We highly recommend using forward slashes “/” within file paths. However, when copying a file path from Windows Explorer it will often have the “” backslash character instead of a forward slash.

This causes issues in two ways. Firstly, this is a Windows exclusive issue, as Mac and Linux operating systems use the forward slash “/”. Secondly the backslash symbol is often used as an escape character within Python. Thirdly; although Python will often accept backslashes, other commonly used languages, like the statistical programming language “R” will not. It’s worth getting into the good practice of using forward slashes.

Lastly, if you absolutely must use backslashes you should preface the string with the letter “r”, to ensure it’s passed as a raw string, rather than a unique character in Python. This is commented out below due to conflicts with the software these notes are written in, but will work for you if you have followed thus far!

Reading a CSV file

titanic_df = pd.read_csv("./data/titanic_clean.csv")
titanic_df.head()
pclass survived name_of_passenger sex_of_passenger age_of_passenger sibsp parch ticket fare cabin embarked
0 1 1 Allen, Miss. Elisabeth Walton female 29.0000 0 0 24160 211.3375 B5 S
1 1 1 Allison, Master. Hudson Trevor male 0.9167 1 2 113781 151.5500 C22 C26 S
2 1 0 Allison, Miss. Helen Loraine female 2.0000 1 2 113781 151.5500 C22 C26 S
3 1 0 Allison, Mr. Hudson Joshua Creighton male 30.0000 1 2 113781 151.5500 C22 C26 S
4 1 0 Allison, Mrs. Hudson J C (Bessie Waldo Daniels) female 25.0000 1 2 113781 151.5500 C22 C26 S

Reading in Excel files

As mentioned earlier we can also use the “pd.read_” functions to read in excel files. These have the file extension “.xlsx” and differ slightly from comma separated value files.

The function for reading in these files is pd.read_excel(), which also takes a minimum of one argument; the location of the file including the file extension.

TipExercise

Import the data police_data.xlsx. We specifically want the second sheet, this has the name “Table P1”. You will need to specify some additional parameters. Look in the help documentation to see which one you should specify.

Hint - If referencing the sheet by index position; remember that Python starts counting at 0!

Reading in JSON Files

JSON (JavaScript Object Notation) files are text files that store data in a structured, human-readable format using key-value pairs, lists, and nested objects. The structure is similar to Python dictionaries and lists.

Why use JSON files in Python?

  • JSON is a common format for data exchange between applications, especially web APIs.
  • It is easy to read and write for both humans and machines.
  • Python’s built-in json module makes it simple to convert between JSON and Python data types (like dictionaries and lists).
  • JSON is language-independent, so it’s widely used for sharing data between different programming languages and systems.

Typical uses in Python:

  • Saving and loading configuration files.
  • Storing and exchanging data with web services.
  • Reading and writing structured data for data analysis.
from pathlib import Path
import json

path = Path("./data/titles_soc.json")
data = json.loads(path.read_text())
print(json.dumps(data, indent=4))

Functions

A function is a bit of code which, when called, performs a task. It can take various inputs, called arguments, and return outputs.

Functions can help us to write code that is consistent, readable, maintainable and reproducible.

Functions are especially useful for reducing repetition. Repetitive code is harder to read and harder to maintain.

Functions within Python generally fall into three categories:

Built in Functions

These are built into Python and always available for use Examples – print(), help().

User Defined Functions

Created by users to carry out specific tasks. Declared using the def keyword.

Anonymous Functions – “lambda functions”

User defined - generally one line functions used within a larger piece of code.

Example

# Function to add two values
def add_two_values(value_1, value_2):
    total = value_1 + value_2
    return total

add_two_values(1, 2)
3

Our functions start with the keyword def (define) with syntax highlighting, which is followed by our function name. This:

  • Must start with a letter or an underscore,
  • Should be lowercase (by python convention),
  • Should be short and descriptive,
  • Can’t have the same name as a python keyword,
  • Shouldn’t have the name of an existing function (it will overwrite it locally).
TipExercise

Assess whether each name follows good naming conventions (clarity, consistency, verbs vs. nouns, etc.)

  • def create_age_sex_pivot_tables()
  • def PrintData()
  • def x()
  • def disease_prevalance()
  • def get_number_of_patients()
  • def False()
  • def import_excel_file()
  • def process1()
  • def clear_temp_directory()
  • def calc()
  • def calculate_bmi()
  • def pivot_and_save_to_excel()
  • def process_data_and_return_result()
  • def process_data_and_return_result_or_error_if_fails()
  • def process_text()

As well as a name, functions can have arguments. Arguments are information, such as data, that are passed into the function.

In the example above, we have passed values into the function add_two_values(), but arguments can also be other data types such as strings.

A function can have multiple arguments.

In the function body, the arguments take the place of the data in the code.

# function to print name
def print_date(day, month, year):
    print("The date is", day, month, year)   

print_date(30, "March", 2026)
The date is 30 March 2026

All of the code inside the function body should be indented.

The return statement ends the function and sends a value back to the caller. It can return any data type.

def get_list():
    my_list = [1, 2, 3, 4, 5]
    return my_list    

numbers = get_list()
print(numbers)
[1, 2, 3, 4, 5]
TipExercise

Convert the code below to a Function.

# Convert this code into a function
height_m = height_cm / 100

bmi = weight_kg / (height_m ** 2)

Docstrings

We heavily promote looking at docstrings (which means document strings), the inbuilt “help” documents in our courses. They’re very useful for finding out what a function or method does, what our parameters are called, and what we should expect to be passed as arguments.

Docstrings commonly describe:

  • what the function or class does
  • what parameters the function or class takes as arguments and their types
  • what the code returns
  • what common errors can occur and the exceptions they’ll raise
  • links to or descriptions of the methodology the function implements
  • example usage of the function

But in general, there is scope to add any information that you consider relevant to an end-user of this particular function.

def add_two_values(value_1, value_2) :
    """ 
    This function will add together two values
    
    Parameters
    ----------
    value_1 : The first value to add
    value_2 : The second value to add
    
    Returns
    -------
    total: The sum / concatenation of the two values specified.
    
   Notes
    -----
    A TypeError is raised if the two types cannot be added together.
    
    """
    total = value_1 + value_2
    return total
def add_two_values(value_1, value_2) :
    """ 
    This function will add together two values
    
    Args:
        value_1(number) : The first value to add
        value_2(number) : The second value to add
    
    Returns:
        total(number): The sum / concatenation of the two values specified.
    
   Notes:

    A TypeError is raised if the two types cannot be added together.
    
    """
    total = value_1 + value_2
    return total
help(add_two_values)
Help on function add_two_values in module __main__:

add_two_values(value_1, value_2)
     This function will add together two values
     
     Args:
         value_1(number) : The first value to add
         value_2(number) : The second value to add
     
     Returns:
         total(number): The sum / concatenation of the two values specified.
     
    Notes:
    
     A TypeError is raised if the two types cannot be added together.
TipExercise

Add a docstring to your function that explains what it does, describes its parameters and return value

def calculate_bmi(height_cm, weight_kg):
    
    if height_cm <= 0:
        raise ValueError("height_cm must be greater than 0.")
    if weight_kg <= 0:
        raise ValueError("weight_kg must be greater than 0.")

    height_m = height_cm / 100
    
    bmi = weight_kg / (height_m ** 2)
    
    return bmi

Python Virtual Environments

What is a Virtual Environment?

A virtual environment is a folder containing a self-contained Python installation and libraries. It helps you avoid conflicts between different projects’ dependencies.

Why Use Virtual Environments?

  • Keeps your project dependencies isolated.
  • Prevents version conflicts between packages.
  • Makes your project easier to share and reproduce.

Creating a Virtual Environment

Open your command line (Terminal, Command Prompt, or PowerShell).

Navigate to your project folder (optional):

cd path/to/your/project

Create a virtual environment named venv:

python -m venv venv

This creates a folder called venv in your project directory.

Activating a Virtual Environment

# Windows
venv\Scripts\activate
# macOS/Linux
source venv/bin/activate

Installing Packages

Once activated, use pip to install packages:

pip install pandas
TipExercise

Create and activate a virtual environment. Install a package and check it’s available.