How to use the IP2I GPU server

Author: Guillaume Baulieu.

Too long, did not read!

If you are in a hurry to test the GPU server, here are some commands to set up a connection.

    ssh -Y lyoui.in2p3.fr
    srun -p test --gres=gpu:1 --mem=20GB --pty bash
    export APPTAINERENV_CUDA_VISIBLE_DEVICES=$CUDA_VISIBLE_DEVICES
    apptainer run --nv /gridgroup/calcul/apptainer/tensorflow.sif
    jupyter notebook --ip 0.0.0.0

The last command should give you an address that you can copy/paste in your web browser. All codes from the notebooks will be executed on the GPU server.

Warning

Do NOT share this address with anyone! Anybody with this address will be able to run commands on the server using YOUR ID as long as your session is running.

Once you are done, type CTRL-C in the command line and log out from the servers.

If you need additional informations on what you've just done... read the following documentation!

Hardware Description

CPU : 2 * Intel Xeon Gold 6248R CPU @ 3.00GHz (total of 96 cores)
RAM : 192 GB
Storage : 1.8 TB on SSD in /scratch/
GPU : 3 * NVIDIA Quadro RTX 6000

Opening a session on the server

The server (lyowork029.in2p3.fr) is part of the SLURM cluster. To open a session, you need to ask for a shell with some GPU ressources.

First, you need to connect to a SLURM User Interface :

ssh -Y lyoui.in2p3.fr

From the SLURM User Interface servers, you can see if there is someone else already using the GPUs using the squeue-gpu command, and then launch your jobs or open interactive session on the GPU server.

Then, you can use the srun command to open the session on the GPU server :

srun -p test --gres=gpu:1 --mem=20GB --pty bash

Informations about the parameters :

-p test : Use the test queue. Your session will be limited to 1 hour before being killed. Set this parameter to gpu once you are ready to work, you'll then be limited to 24H.
--gres=gpu:1 : ask for 1 GPU. If you need to use several GPUs simultaneously, you can increase the number up to 3 (but you may have to wait longer to get the ressources).
--mem=20GB : ask for 20GB of available RAM. Please keep in mind that up to 3 users will share the memory: do not exceed 60 GB per GPU.

The command will block until the asked ressources are available. You will then have access to a shell prompt and can start working!

Warning

SLURM will have set the $CUDA_VISIBLE_DEVICES environment variable according to your allocated GPU card(s): please do NOT modify this variable! Your processes could be sent to an already used card and fail.

Installing Software

Now that you have access to the server, you will want to run your code using some software, libraries or modules. You have two ways of installing what you need :

miniconda (python)
apptainer

Using miniconda

Miniconda will let you install all the python packages you need in your personal directory. For further instructions, you can read the dedicated section on the CC GPU cluster tutorial.

Using apptainer

Apptainer (formerly singularity) will let you use a pre-configured image to create a container. You can use an image from any linux distribution, containing any softwares, librairies or python modules. While the GPU server's operating system is a CentOS 7, your container can be running an Ubuntu 20 with dedicated packages.

For those using Docker, Apptainer is just an alternative more suited to multi-users environment from a security point of view.

List of images

You can find existing images in the /gridgroup/calcul/apptainer/ folder.

File Name	OS	Packages
cosmo.sif	Ubuntu 20.04.3 LTS (Focal Fossa)	Python packages for cosmo group
pytorch_2-0.sif	Ubuntu 18.04.6 LTS (Bionic Beaver)	PyTorch 2.0 bokeh jupyter matplotlib numpy pandas scikit-learn seaborn uproot
mxnet.sif	Ubuntu 18.04.6 LTS (Bionic Beaver)	MXNet 1.9.0 bokeh D2L jupyter matplotlib numpy pandas root 6.24/08 scikit-learn seaborn uproot
tensorflow_2-11.sif	Ubuntu 20.04.5 LTS (Focal Fossa)	TensorFlow 2.11.0 bokeh jupyter keras_tuner matplotlib numpy pandas redis scikit-learn seaborn uproot
tensorflow_2-14.sif	Ubuntu 20.04.5 LTS (Focal Fossa)	TensorFlow 2.14.0 tensorflow_probability bokeh jax jax_cosmo jupyter keras_tuner matplotlib numpy pandas PyCBC redis scikit-learn seaborn uproot

You can use already existing docker images by converting them for apptainer :

    export APPTAINER_TMPDIR=/scratch/
    export APPTAINER_CACHEDIR=/scratch/
    apptainer build /path/to/output/<image name>.sif docker://<registry server>/<image name>:<tag>

Please let us know if you have such images, so that we can share our environments!

Launching a container

Once you have a apptainer image, you can create a container using the following commands :

    export APPTAINERENV_CUDA_VISIBLE_DEVICES=$CUDA_VISIBLE_DEVICES
    apptainer run --nv /path/to/image/tensorflow.sif

Warning

Do not forget the export command! It will ensure that you are using the correct GPU cards once inside your container!

Note

By default you will only have access to your home directory from inside the container. You can add additional directories by using the -B parameter in the apptainer run command (ie apptainer run --nv -B /gridgroup /path/to/image/tensorflow.sif will map the /gridgroup directory in the container).

Using a Jupyter Notebook

In many cases, it might be convenient to use a Jupyter Notebook in your browser to access the GPU server. It gives you the possibility to directly type your code in your browser, execute it on the GPU server and get the results (text, images, plots, ...) in your browser.

To launch a jupyter server from your container :

    cd /to/your/chosen/folder/
    jupyter notebook --ip 0.0.0.0

It should output something like :

[I 15:15:21.703 NotebookApp] Serving notebooks from local directory: /home/[...]
[I 15:15:21.703 NotebookApp] Jupyter Notebook 6.1.5 is running at:
[I 15:15:21.703 NotebookApp] http://lyowork029.in2p3.fr[...]
[I 15:15:21.703 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[W 15:15:21.762 NotebookApp] No web browser found: could not locate runnable browser.

Copy/paste the address given on the third line in your web browser, it should give you an interface to create notebooks that will be executed on the GPU server!

Warning

Do NOT share this address with anyone! Anybody with this address will be able to run commands on the server using YOUR ID as long as your session is running.

Note

As an alternative to the generated token included in the address, you have the possibility to set a password for your jupyter sessions using the command jupyter notebook password

Note

The GPU server is only accessible from the laboratory's network. If you are not directly connected to the IP2I network, use a VPN connection.

Once you have finished your work, save your notebook from the web interface, stop the jupyter notebook server (ctrl-C in the command line) and log out from the server.