Slurm-Hydra-Submitit
This repository is a minimal working example on how to:
- setup Hydra
- setup batch of slurm jobs on top of Hydra via submitit-launcher
Set up Hydra
⚠️ You need to installhydra-core
for this step.
Hydra is fairly easy to set-up:
- one .yaml configuration file containing the default config values
- a
@hydra.main
wrapper on your main experiment function to pass the configurations values as argument.
By simply running python slurm_hydra_submitit/script.py
, you'll see how the main function takes the arguments from the configuration file and pass them to the following underlying functions.
Launch jobs on a SLURM cluster with Hydra submitit launcher
Launch a job on the cluster
⚠️ You need to installhydra-submitit-launcher
for this step.
Now that our Hydra conf is setup, we want to run the job on a SLURM cluster instead of our local computer. For that, we need to:
- specify the hydra launcher to work on the SLURM cluster
- specify the hardware specifications for the SLURM job
If you connect to your SLURM cluster scheduler node, just by installing hydra-submitit-launcher
, you can already launch jobs on the cluster with:
python slurm_hydra_submitit/script.py --multirun hydra/launcher=submitit_slurm
To test locally before sending to the cluster, you can switch the hydra/launcher
argument to submitit_local
.
Adapt node parameters
You can easily adapt the SLURM parameters by modifying the following arguments SLURM launcher arguments.
For example, the following script is executed on nodes with 10 CPUs: python slurm_hydra_submitit/script.py --multirun hydra/launcher=submitit_slurm hydra.launcher.cpus_per_task=10
Launch array of jobs on the SLURM cluster
Grid Search
You can launch multiple jobs at once by specifying their values in the launch command.
For example, the following command launches 4 jobs which corresponds to all the possible combinations of arguments.
python slurm_hydra_submitit/script.py --multirun hydra/launcher=submitit_slurm project_name=P1,P2 train.epochs=30,40
Specific Parameters Combinations
Alternatively, you can pass sets of parameters to test together:
python slurm_hydra_submitit/script.py --multirun hydra/launcher=submitit_slurm +compile="{project_name:P1,train.epochs:30}, {project_name:P2,train.epochs:40}"
To clean this command a bit, we can create a bash script similar to this:
#!/bin/bash
params=(
'{project_name:P1,train.epochs:10},'
'{project_name:P2,train.epochs:20}'
)
slurm_hydra_submitit/script.py --multirun hydra/launcher=submitit_slurm +compile="${params[*]}"