How to Setup a Google Cloud VM, pull a GitHub repo, install docker, and run a script at regular intervals using cron (in 22 steps)

Data Science Campus GitHub

This is a short guide on how to setup a Google Cloud Platform VM, pull a GitHub repository and let cron run a script on regular intervals. There probably are some bad practices in this guide, such as not using a virtual environment or conda environment, and instead using the base python environment. However, the aim of this was to create a single-use VM for a specific task - which we could leave run chugging away.

Step 1: To setup VM on GCP first go to Compute Engine, then VM instances, then Create Instance. Name the VM instance. For this example, all default settings should be fine, except:
  • set region to europe-west2 (London)
  • set zone to europe-west2-b
  • you may want to increase the Boot Disk
  • allow HTTP and HTTPS traffic on the Firewall
Step 2: Click on the down arrow next to SSH and choose view Cloud command. Copy the command to a terminal window and run.
Step 3: Run gcloud init in your SSH’d terminal window and sign in with a your personal Google GCP account (option 2). Follow the instructions and set the required project and region.
Step 4: Step 4 and 5 were sourced from here. Install a few crucial packages for Debian.
```bash
$ sudo apt-get update 
$ sudo apt-get install bzip2 git libxml2-dev
```
Step 5: Install miniconda
```bash
$ wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh 
$ bash Miniconda3-latest-Linux-x86_64.sh 
$ rm Miniconda3-latest-Linux-x86_64.sh 
$ source .bashrc 
$ conda install scikit-learn pandas jupyter ipython
```
Step 6: Set your global git credentials
```bash
$ git config --global user.name 'User Name' 
$ git config --global user.email 'User Email'
```
Step 7: Steps 7 to 9 were sourced from here. Generate a new SSH key for GitHub. First, paste the text below into your terminal window. Use the default location and no password.
```bash
$ ssh-keygen -t rsa -b 4096 -C 'User Email'
```
Step 8: Start the ssh-agent in the background.
```bash
$ eval "$(ssh-agent -s)"
```
Step 9: Add your SSH private key to the ssh-agent. If you created your key with a different name, or if you are adding an existing key that has a different name, replace id_rsa in the command with the name of your private key file.
```bash
$ ssh-add ~/.ssh/id_rsa
```
Step 10: Steps 10 and 11 were sourced from here. Paste your SSH key to terminal and then copy it to clipboard
```bash
$ cat < ~/.ssh/id_rsa.pub
```
Step 11: Go to https://github.com/settings/ssh/new and add a title for your key, then paste in the clipboard text into the key box. Click Add Key and then enter your password.
Step 12: Make a GitProjects directory and then cd to it.
```bash
$ mkdir GitProjects 
$ cd GitProjects
```
```bash
$ git clone repo
```
Step 14: Install docker (steps 14 to XX were sourced from here). Install the packages necessary to add a new repository over HTTPS.
```bash
$ sudo apt update
$ sudo apt install apt-transport-https ca-certificates curl software-properties-common gnupg2
```

Step 15: Import the repository’s GPG key using the following curl command:

```bash
$ curl -fsSL https://download.docker.com/linux/debian/gpg | sudo apt-key add -
```

Step 16: Add the stable Docker APT repository to your system’s software repository list.

```bash
$sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/debian $(lsb_release -cs) stable"
```

Step 17: Update the apt package list and install the latest version of Docker CE (Community Edition).

```bash
$ sudo apt update
$ sudo apt install docker-ce
```

Step 18: Once the installation is completed the Docker service will start automatically. To verify it type in.

```bash
$ sudo systemctl status docker
```

Step 19: Run a docker image.

```bash
$ sudo docker run -p port:port <image_name>
```
Step 20: Create a cron job. Select nano as the editor (option 1)
```bash
$ crontab -e
```
Step 21: Paste into your crontab the cron jobs you want to run. The below example goes to the project folder and runs example.py each hour of every day. Log outputs are saved to cron.out.
```text
$ 0 * * * * cd ~/GitProjects/project && /home/username/miniconda3/bin/python ~/GitProjects/project/example.py >> ~/GitProjects/project/cron.out 2>&1
```
Step 22: Use ^X to save the crontab.

Your python script should now run at XX:00 every hour. See crontab guru for more information about setting cron times.


Michael Hodge

By

Senior Data Scientist at ONS Data Science Campus.

Updated