Setting up the environment for GPU

Introduction #

Setting up the environment for GPU can be challenging, as every computer has different hardware and software configurations. There are no universal instructions that will work for everyone, but in this chapter, we will discuss how to set up the environment using Google Colab as well as a GPU on a remote server. We will also highlight the steps to verify that the GPU is working properly.

Google Colab #

Google Colab is a cloud-based platform that allows you to run Jupyter Notebook in the Cloud with support for GPU-acceleration. Google Colab is free to use, and you do not need to install any software or setup CUDA manually. It comes with pre-installed libraries such as Pandas or NumPy, so you do not need to run pip install yourself.

How to start? #

The first step is to create a new notebook in Google Colab. You can do this by going to the Google Colab website, signing in with your Google account, and creating a new notebook.

Figure 1: Create a new notebook in Google Colab.

Enabling GPU support #

After creating a new notebook, you need to enable GPU support. To do this, go to the Runtime menu and select Change runtime type. Then, set the Hardware accelerator to GPU.

>

Figure 2: Select Change runtime type in the Runtime menu.

After you change your runtime type, your notebook will restart, which means information from the previous session will be lost.

Figure 3: Select GPU in the Hardware accelerator menu.

You may also see the TPU option in the drop-down menu. TPU stands for a Tensor Processing Unit, which is a specialised chip designed to accelerate machine learning workloads. TPU is more powerful than GPU, but it is also more expensive.

Verify GPU access #

After you enable GPU support, you can verify that the GPU is working properly by running the code below in your notebook.

!nvidia-smi

nvidia-smi (NVIDIA System Management Interface) is a tool to query, monitor and configure NVIDIA GPUs. It ships with and is installed along with the NVIDIA driver, and it is tied to that specific driver version. It is a tool written using the NVIDIA Management Library (NVML), which you can also find the Python bindings in the PyPI package pynvml.

nvidia-smi outputs a summary table (see Figure 4 for an example output), where you will find the following useful information:

  • Name: The name of the GPU.
  • Memory: The amount of memory available on the GPU.
  • Utilisation: The percentage of time the GPU is being used.
  • Power usage: The power usage of the GPU.
  • Processes: List of processes executing on the GPUs.

Figure 4: NVIDIA System Management Interface output.

In this example, we got a Tesla T4 GPU with 16GB of memory. The GPU is currently being used by the notebook, but we can also see that there are no other processes running on the GPU. There is several other ways to verify that the GPU is working, for instance, directly with the torch library by running the following code:

import torch
use_cuda = torch.cuda.is_available() #Check if GPU is available
if use_cuda:
    print('__CUDNN VERSION:', torch.backends.cudnn.version()) #Get the version of cudnn
    print('__Number CUDA Devices:', torch.cuda.device_count()) #Get the number of available GPUs
    print('__CUDA Device Name:',torch.cuda.get_device_name(0)) #Get the name of the GPU
    print('__CUDA Device Total Memory [GB]:',torch.cuda.get_device_properties(0).total_memory/1e9) #Get the total amount of memory available on the GPU

Best practices #

Google Colab is a great tool for running GPU-accelerated notebooks in the cloud. However, there are some limitations that you should be aware of.

Google Colab is not an unrestricted resource: Google Colab is free to use, but they limit the amount of time you can use the GPU. Google Colab will automatically disconnect the notebook if we leave it idle for more than 30 minutes, meaning that your training will be interrupted and you will have to restart it. Nevertheless, you can run the free of charge notebooks for at most 12 hours at a time as explained in the FAQ. If you want to use the GPU for more than 12 consecutive hours or require access to more memory, you can consider switching to the paid version of Colab.

Working with data: You can access your files that are stored on Google Drive directly from Google Colab. To do this, you need to mount your Google Drive to the notebook. You can do this by running the following code in a code block in your notebook. Use the bash command !cd the Files panel on the left to access your files.

from google.colab import drive
drive.mount('/content/drive')

You can also upload or download a file (or files) from/to your computer using the following code.

from google.colab import files
#Upload a file from your computer
files.upload()
#Download a file to your computer
files.download('path/to/your/file')

Ensure all files are completely copied to your Google Drive: As mentioned above, Colab can and will terminate your session due to inactivity. To ensure that your final model and data is saved to your Google Drive, call drive.flush_and_unmount() before you terminate your session. Furthermore, you can checkpoint your model during training process and save it to your Google Drive as well.

from google.colab import drive
model.fit(...) # Training your model
model.save('path/to/your/model') # Save your model
drive.flush_and_unmount()
print('All changes made in this Colab session should now be visible in Drive.')

Note that the completion of copying/writing files to /content/drive/MyDrive/ does not mean that all files are safely stored on Google Drive and that you can immediately terminate your Colab instance because the transfer of data between the Virtual Machine on which you run your analysis and Google’s infrastructure happens asynchronously, so performing this flushing will help ensure it is indeed safe to disconnect.

Disconnecting from the notebook: Be a responsible user and disconnect from the notebook when you are finished with your GPU. Google will notice and reward you with a better GPU the next time you request one! So after you have copied your saved model files to Google Drive or Google Storage, you can disconnect from the notebook by clicking the Disconnect button in the Runtime menu. Or you can run the following code in a code block in your notebook.

from google.colab import runtime
runtime.unassign()

Getting started: Remote server #

At UCL, Department of Geography, we have a dedicated GPU-server that we can use to run GPU-accelerated notebooks on a Tesla V100 GPU. In theory, you should be able to connect to the server from anywhere in the world and run your GPU analyses - although access to the server needs to be requested. Setting up the environment on the server is a bit more complicated than setting up on cloud-based solutions such as Google Colab. For the below to work you will need three things: access to the server, a correct installation of CUDA, and an installation of the conda package manager.

ssh into the server #

If you have access to a dedicated GPU-server, you can log into the server using the following bash command:

ssh username@server_ip port

After this command, you will be prompted to enter your password.

Installing libraries #

You can initialise a new conda environment using the following bash command.

conda create -n gpu -c rapidsai -c nvidia -c conda-forge rapids=22.06 python=3.9 cudatoolkit=11.5 jupyterlab

This will create a new conda environment called gpu and install the CUDA toolkit with some of the essential libraries that you will need.

Activate the environment #

You can activate the environment using the following bash command.

conda activate gpu

Start the JupyterLab server #

You can start the JupyterLab server using the following bash command.

jupyter lab --no-browser

This will start the JupyterLab server, and you will be able to access it from your browser.

Getting started: Local machine #

If your local machine has a supported GPU, you might be able to install the CUDA toolkit. Most of GPU-accelerated libraries build on top of the CUDA framework. Therefore, you need to have a GPU that supports CUDA. You can check the CUDA GPU support matrix to see if your GPU is supported. Of course, you should have a decent CPU, RAM and Storage to be able to leverage the power of the GPU.

For the minimum hardware requirements, we recommend the following:

  • CPU: Minimum 4 cores at 2.6GHz; at least 16GB of RAM.
  • GPU: NVIDIA GPU with at least 6GB of VRAM.

As installations and setting up instructions depend on your operating system and hardware, please refer to the official documentation.