Gricad cluster
Execute the code on the Gricad cluster.
Please note that the cluster request (oarsub
command) and singularity
container are
configurable from the config file.
Preliminary step:
You need to first configure your SSH access to both bastions of the Gricad cluster:
ssh-copy-id PERSEUS_LOGIN@rotule.univ-grenoble-alpes.fr
ssh-copy-id PERSEUS_LOGIN@trinity.univ-grenoble-alpes.fr
And also on the gricad cluster you would like to use (here with
bigfoot
for example):
ssh-copy-id -o "ProxyCommand ssh -q PERSEUS_LOGIN@access-gricad.univ-grenoble-alpes.fr nc -w 60 %h %p" PERSEUS_LOGIN@bigfoot
Gricad is composed of different clusters (dahu
, bigfoot
for GPUs, luke
, and froggy
).
You can configure the one you want to use in the config file.
Available subcommands:
remi gricad setup
Set up your project on Gricad.
remi gricad push [-f | --force]
Sync the content of the project directory to the gricad cluster.
If no changes are detected locally, the file sync will not be attempted.
Options:
-f | --force
: Run the sync command even if no local changes were detected.
remi pull [-f | --force] [REMOTE_PATH]
Sync the content of the provided REMOTE_PATH
directory from the gricad cluster to the local
cluster.
This can be used to sync back experimental output that result from a computation done remotely.
If no path is specified, output/
will be used as the default value.
Options:
-f | --force
: Do not ask for a confirmation before pulling.
Use with caution. (Eventually conflicting local files might be overridden).
remi gricad clean [-f | --force] [REMOTE_PATH]
Clean the content of the provided REMOTE_PATH
directory on the gricad cluster.
If no directory is specified, output/
will be used as the default value.
Options:
-f | --force
: Do not ask for a confirmation before cleaning.
Use with caution.
remi gricad [script] [OPTIONS]
Run a bash script on the gricad cluster.
This is the default subcommand (and can thus be run using remi cluster
).
Options:
-s | --script
: The path to a bash script to run.
Default:script.sh
-n | --job-name
: A custom name for the cluster job (oarsub
’s--name
option).
Default: The project name-g | --gpu-model
: GPU model (‘A100’, ‘V100’ or ‘T4’)
Default: The value defined in the gricad/oarsub section of the config file.-c | --container
: The name/path of the container image (.sif
) that you want to use.
Default: Thegricad.singularity_image
property inconfig.yaml
.--no-push
: Do not attempt to sync project files to the gricad cluster.
Examples:
remi gricad
: Runscript.sh
on the cluster.remi gricad -s training_script.sh
: Runtraining_script.sh
on the cluster.
remi gricad command [OPTIONS] COMMAND
Run the specified COMMAND
on the cluster.
Options:
-n | --job-name
: A custom name for the cluster job (oarsub
’s--name
option).
Default: The project name-g | --gpu-model
: GPU model (‘A100’, ‘V100’ or ‘T4’) Default: The value defined in the gricad/oarsub section of the config file.-c | --container
: The name/path of the container image (.sif
) that you want to use.
Default: Thegricad.singularity_image
property inconfig.yaml
.--no-push
: Do not attempt to sync project files to the gricad cluster.
Example:
remi gricad command "./test.sh --number_steps=1000"
: Run the command./test.sh --number_steps=1000
on the cluster.
remi gricad interactive [OPTIONS]
Start an interactive session on the cluster. This runs oarsub
with the --interactive
flag.
Options:
-n | --job-name
: A custom name for the cluster job (oarsub
’s--name
option).
Default: The project name-g | --gpu-model
: GPU model (‘A100’, ‘V100’ or ‘T4’) Default: The value defined in the gricad/oarsub section of the config file.-c | --container
: The name/path of the container image (.sif
) that you want to use.
Default: Thegricad.singularity_image
property inconfig.yaml
.--no-push
: Do not attempt to sync project files to the gricad cluster.
Example:
remi gricad interactive --no-push
: Start an interactive session on the cluster without pushing local changes.
remi gricad recap
Run recap.py
on the cluster to list compute nodes information (CPU, GPU…).
remi gricad chandler
Run chandler
on the cluster to list compute nodes occupation (free/busy).
remi gricad stat
Get some information about your running/planned jobs thanks to oarstat
.
remi gricad connect OAR_JOB_ID
Connect to a running job.
Example:
remi gricad connect 6267518
remi gricad kill OAR_JOB_ID
Kill one or multiple running job(s).
Example:
remi gricad kill 6267518 6267519 6267520