SLURM

Let us make sure we have a more verbose squeue command and a helper function to get dependencies.

cat >> ~/.bashrc << EOF
alias squeue='squeue --format="%.3i %.9P %.40j %.8T %.10M %.6D %.30R %E"'
## Fetches JOBIDs for a specific partition and a specific job name and returns dependency string for sbatch
function genSlurmDep() {
   if [[ -z \$1 ]];then
       >&2 echo ">> Please specify partion"
       return 1
   fi
   if [[ ! -z \$2 ]];then
        IDS=\$(/opt/slurm/bin/squeue -p \$1 --noheader -o "%.5i %.60j" |egrep "\$2"|awk '{print \$1}' |xargs|sed -e 's/ /,/g')
        DEP=\$(/opt/slurm/bin/squeue --jobs=\${IDS} -p \$1 --noheader -o "%i" |xargs|sed -e 's/ /:/g')
   else
        DEP=\$(/opt/slurm/bin/squeue -p \$1 --noheader -o "%i" |xargs|sed -e 's/ /:/g')
   fi
   if [[ ! -z \${DEP} ]];then
           echo "--dependency=afterany:\${DEP}"
   fi
}

function filterSlurmNames() {
   if [[ -z \$1 ]];then
       >&2 echo ">> Please specify partion"
       return 1
   fi
   if [[ -z \$2 ]];then
       >&2 echo ">> Please specify jobname pattern (regex)"
       return 1
   fi
   IDS=\$(/opt/slurm/bin/squeue -p \$1 --noheader -o "%.5i %.60j" |egrep "\$2"|awk '{print \$1}' |xargs|sed -e 's/ /,/g')
   squeue --jobs=\${IDS}
}
EOF
source ~/.bashrc

We will use job dependencies and allocations to prevent instances from being automatically scaled down.

The first script runs a non-exclusive job, which allows other jobs to be scheduled on the same node.

mkdir -p ~/slurm
cat > ~/slurm/sleep.sbatch << EOF 
#!/bin/bash
#SBATCH --output=/dev/null
#SBATCH --error=/dev/null
#SBATCH --job-name=sleep-inf
sleep inf
EOF

This script runs --exclusive and thus won’t share the node with another SLURM job.

cat > ~/slurm/sleep-exclusive.sbatch << EOF 
#!/bin/bash
#SBATCH --exclusive
#SBATCH --output=/dev/null
#SBATCH --error=/dev/null
#SBATCH --job-name=sleep-inf-exclusive
sleep inf
EOF

We’ll submit a job to spin up two C5n instances in non-exclusive mode, so that we can use them with different jobs.

sbatch -N2 -p cpu ~/slurm/sleep.sbatch

The helper function genSlurmDep (included above) allows us to fetch dependencies to control the execution of jobs.

squeue
echo sbatch $(genSlurmDep cpu) -N2 -p cpu ~/slurm/sleep.sbatch
sbatch $(genSlurmDep cpu) -N2 -p cpu ~/slurm/sleep.sbatch
squeue

For the next step you need to wait for the instance to be in running state

When we cancel the first job, the second will get started.

scancel $(squeue --states=RUNNING -o "%i" --noheader) && squeue && sleep 2 && squeue