Summary & Further Resources - Python
In this course we have worked through:
- why we should structure our analysis programs
- what different programming styles look like
- how to convert scripts into functions based on code similarity
- how to convert whole scripts into functions calling other functions
- why project file structure is important
- how to separate scripts into different files (modules)
- a case study to practice working on an analysis pipeline
By using the ideas and skills in the course your analysis will be:
- easier to reproduce
- easier to fix
- easier to extend
This course has focussed on moving from scripts to functions and modules. To avoid this task you can start your projects written with all functions and in different files - this takes practice, but planning and experience will make it easier.
1 Further Reading
Structuring your code in projects is an important part of building reproducible analysis.
The concepts introduced in this material are a starting point to help break down your scripts into functions and files.
You can further improve your projects by taking into account other concepts contained within the Quality Assurance of Code for Analysis and Research.
Below is a brief summary of related concepts which will help further structure your code. These topics will be covered in later material once created, for now links to relevant resources are included.
1.1 Documentation
When we change our project to be across multiple files it becomes even more important than usual that we document our code well. This helps us understand what is happening across functions and modules without having to learn every single detail.
Minimum function documentation has been added to the case study analysis.
Better functions would contain full docstrings that explain the parameters, behaviour and outputs of each function in detail.
The best docstring format is the one already being used in a team/project. The next best is following a style guide such as:
In this course we have grouped functions together into files based on what they do. For each file (module) we want to document what code is in that file, and what it does.
1.2 Packaging Code
Soon there will be a new course created to work through the process of creating packages of code.
The materials below cover this process at a basic level.
Python:
1.3 Environment Management
The code you write depends on the version of packages you have used. In order to have others run your code they need to have compatible, ideally identical package versions.
Recording what versions of packages used to run code allow other to run the code later on.
We can ensure the correct environments are loaded and recorded using “virtual environments”, which allow you to have a different, separate set of packages to be installed. You can then pick which set you want to use at any time. Note that localised libraries created for virtual environments do not affect the versions of packages installed to your global library. Using an older version of a package for a specific virtual environment will not affect your other projects.
There are a range of virtual environment approaches for Python. Two common ones are linked to below. The first is provided in base Python venv
, the second is a popular third-party approach poetry
.
If you are using an Anaconda based python distribution you may want to use the conda environment instead.