C5n

Community Images

To run this benchmark we fetch community images from gallery.ecr.aws/hpc.

We’ll use GROMACS build with mpich and a predefined number of

Two images with different tags:

  1. c5n_18xl_on: build for a c5n.18xllarge with hypterthreading on. This image will use 72 MPI Ranks.
  2. c5n_18xl_off: build for a c5n.18xllarge with hypterthreading off. This image will use 36 MPI Ranks.
export SARUS_C5_MNP_IMG=public.ecr.aws/hpc/spack/gromacs/2021.1/mpich:c5n_18xl_on
sarus pull ${SARUS_C5_MNP_IMG}
sarus pull public.ecr.aws/hpc/spack/gromacs/2021.1/mpich:c5n_18xl_on

Custom Build Image

In case you build the image in a previous section, please paste your image name (according to sarus images).

read -p "paste your image name (according to 'sarus images')" SARUS_C5_MNP_IMG

4 OpenMP

Our first multi-node run uses two nodes (144 vCPUs) and a decomposition 36 ranks and 4 OpenMP threads.

mkdir -p ~/slurm/
cat > ~/slurm/gromacs-sarus-c5n-mpich-18x4.sbatch << EOF
#!/bin/bash
#SBATCH --job-name=gromacs-sarus-c5n-mpich-18x4
#SBATCH --ntasks-per-node=18
#SBATCH --cpus-per-task=4
#SBATCH --exclusive
#SBATCH --output=/fsx/logs/%x_%j.out
#SBATCH --partition=c5n

module load intelmpi
unset I_MPI_PMI_LIBRARY
mkdir -p /fsx/jobs/\${SLURM_JOBID}

export INPUT=/fsx/input/gromacs/benchRIB.tpr

mpirun sarus run --mpi ${SARUS_C5_MNP_IMG} -c "gmx_mpi mdrun -s \${INPUT} -resethway -ntomp \${SLURM_NTASKS}"
EOF

2 OpenMP

Another script uses 2 OpenMP threads and 36 ranks per nodes.

mkdir -p ~/slurm/
cat > ~/slurm/gromacs-sarus-c5n-mpich-36x2.sbatch << EOF
#!/bin/bash
#SBATCH --job-name=gromacs-sarus-c5n-mpich-36x2
#SBATCH --ntasks-per-node=36
#SBATCH --cpus-per-task=2
#SBATCH --exclusive
#SBATCH --output=/fsx/logs/%x_%j.out
#SBATCH --partition=c5n

module load intelmpi
unset I_MPI_PMI_LIBRARY
mkdir -p /fsx/jobs/\${SLURM_JOBID}

export INPUT=/fsx/input/gromacs/benchRIB.tpr

mpirun sarus run --mpi ${SARUS_C5_MNP_IMG} -c "gmx_mpi mdrun -s \${INPUT} -resethway -ntomp \${SLURM_NTASKS}"
EOF

1 OpenMP

Another script uses 1 OpenMP threads and 72 ranks per nodes.

mkdir -p ~/slurm/
cat > ~/slurm/gromacs-sarus-c5n-mpich-72x1.sbatch << EOF
#!/bin/bash
#SBATCH --job-name=gromacs-sarus-c5n-mpich-72x1
#SBATCH --ntasks-per-node=72
#SBATCH --cpus-per-task=1
#SBATCH --exclusive
#SBATCH --output=/fsx/logs/%x_%j.out
#SBATCH --partition=c5n

module load intelmpi
unset I_MPI_PMI_LIBRARY
mkdir -p /fsx/jobs/\${SLURM_JOBID}

export INPUT=/fsx/input/gromacs/benchRIB.tpr

mpirun sarus run --mpi ${SARUS_C5_MNP_IMG} -c "gmx_mpi mdrun -s \${INPUT} -resethway -ntomp \${SLURM_NTASKS}"
EOF

Submit Jobs

Let’s submit two jobs of those.

sbatch -N2 --job-name=gromacs-sarus-c5n-mpich-72x2 ~/slurm/gromacs-sarus-c5n-mpich-36x2.sbatch
sbatch -N2 --job-name=gromacs-sarus-c5n-mpich-72x2 ~/slurm/gromacs-sarus-c5n-mpich-36x2.sbatch
sbatch -N4 --job-name=gromacs-sarus-c5n-mpich-144x2 ~/slurm/gromacs-sarus-c5n-mpich-36x2.sbatch
sbatch -N4 --job-name=gromacs-sarus-c5n-mpich-144x2 ~/slurm/gromacs-sarus-c5n-mpich-36x2.sbatch
sbatch -N8 --job-name=gromacs-sarus-c5n-mpich-288x2 ~/slurm/gromacs-sarus-c5n-mpich-36x2.sbatch
sbatch -N8 --job-name=gromacs-sarus-c5n-mpich-288x2 ~/slurm/gromacs-sarus-c5n-mpich-36x2.sbatch
sbatch -N2 --job-name=gromacs-sarus-c5n-mpich-36x4 ~/slurm/gromacs-sarus-c5n-mpich-18x4.sbatch
sbatch -N2 --job-name=gromacs-sarus-c5n-mpich-36x4 ~/slurm/gromacs-sarus-c5n-mpich-18x4.sbatch
sbatch -N4 --job-name=gromacs-sarus-c5n-mpich-72x4 ~/slurm/gromacs-sarus-c5n-mpich-18x4.sbatch
sbatch -N4 --job-name=gromacs-sarus-c5n-mpich-72x4 ~/slurm/gromacs-sarus-c5n-mpich-18x4.sbatch
sbatch -N8 --job-name=gromacs-sarus-c5n-mpich-144x4 ~/slurm/gromacs-sarus-c5n-mpich-18x4.sbatch
sbatch -N8 --job-name=gromacs-sarus-c5n-mpich-144x4 ~/slurm/gromacs-sarus-c5n-mpich-18x4.sbatch

sbatch -N2 --job-name=gromacs-sarus-c5n-mpich-144x1 ~/slurm/gromacs-sarus-c5n-mpich-72x1.sbatch
sbatch -N2 --job-name=gromacs-sarus-c5n-mpich-144x1 ~/slurm/gromacs-sarus-c5n-mpich-72x1.sbatch
sbatch -N4 --job-name=gromacs-sarus-c5n-mpich-288x1 ~/slurm/gromacs-sarus-c5n-mpich-72x1.sbatch
sbatch -N4 --job-name=gromacs-sarus-c5n-mpich-288x1 ~/slurm/gromacs-sarus-c5n-mpich-72x1.sbatch
sbatch -N8 --job-name=gromacs-sarus-c5n-mpich-576x1 ~/slurm/gromacs-sarus-c5n-mpich-72x1.sbatch
sbatch -N8 --job-name=gromacs-sarus-c5n-mpich-576x1 ~/slurm/gromacs-sarus-c5n-mpich-72x1.sbatch
squeue
sbatch -N16 --job-name=gromacs-sarus-c5n-mpich-288x4 ~/slurm/gromacs-sarus-c5n-mpich-18x4.sbatch
sbatch -N16 --job-name=gromacs-sarus-c5n-mpich-288x4 ~/slurm/gromacs-sarus-c5n-mpich-18x4.sbatch
sbatch -N16 --job-name=gromacs-sarus-c5n-mpich-576x2 ~/slurm/gromacs-sarus-c5n-mpich-36x2.sbatch
sbatch -N16 --job-name=gromacs-sarus-c5n-mpich-576x2 ~/slurm/gromacs-sarus-c5n-mpich-36x2.sbatch
sbatch -N32 --job-name=gromacs-sarus-c5n-mpich-576x4 ~/slurm/gromacs-sarus-c5n-mpich-18x4.sbatch
sbatch -N32 --job-name=gromacs-sarus-c5n-mpich-576x4 ~/slurm/gromacs-sarus-c5n-mpich-18x4.sbatch

Afterwards you are going to see a bunch of jobs in the queue. Grab a coffee - that will take a couple of minutes. In the example below I increased the node-count to 32.

Results

After those runs are done, we grep the performance results.

grep -B2 Performance /fsx/logs/gromacs-s*

This extends the table from the gromacs-on-pcluster workshop started with decomposition.

Single-Node

# scheduler execution spec # * instance Ranks x Threads ns/day
1 slurm native gromacs@2021.1 1 * c5n.18xl 18x4 4.7
2 slurm native gromacs@2021.1 1 * c5n.18xl 36x2 5.3
3 slurm native gromacs@2021.1 1 * c5n.18xl 72x1 5.5
4 slurm native gromacs@2021.1 ^intel-mkl 1 * c5n.18xl 36x2 5.4
5 slurm native gromacs@2021.1 ^intel-mkl 1 * c5n.18xl 72x1 5.5
6 slurm native gromacs@2021.1 ~mpi 1 * c5n.18xl 36x2 5.5
7 slurm native gromacs@2021.1 ~mpi 1 * c5n.18xl 72x1 5.7
8 slurm native gromacs@2021.1 +cuda ~mpi 1 * g4dn.8xl 1x32 6.3
9 slurm sarus gromacs@2021.1 ~mpi 1 * c5n.18xl 36x2 5.45
10 slurm sarus gromacs@2021.1 ~mpi 1 * c5n.18xl 72x1 5.65

Multi-Node

# scheduler execution spec # * instance Ranks x Threads ns/day
11 slurm sarus gromacs@2021.1 ^mpich 2 * c5n.18xl 36x4 8.8
12 slurm sarus gromacs@2021.1 ^mpich 2 * c5n.18xl 72x2 9.0
13 slurm sarus gromacs@2021.1 ^mpich 2 * c5n.18xl 144x1 9.65
14 slurm sarus gromacs@2021.1 ^mpich 4 * c5n.18xl 72x4 15.3
15 slurm sarus gromacs@2021.1 ^mpich 4 * c5n.18xl 144x2 16.1
16 slurm sarus gromacs@2021.1 ^mpich 4 * c5n.18xl 288x1 16.85
17 slurm sarus gromacs@2021.1 ^mpich 8 * c5n.18xl 144x4 25.8
18 slurm sarus gromacs@2021.1 ^mpich 8 * c5n.18xl 288x2 28
19 slurm sarus gromacs@2021.1 ^mpich 8 * c5n.18xl 576x1 29
20 slurm sarus gromacs@2021.1 ^mpich 16 * c5n.18xl 288x4 41.3
21 slurm sarus gromacs@2021.1 ^mpich 16 * c5n.18xl 576x2 44.5
23 slurm sarus gromacs@2021.1 ^mpich 32 * c5n.18xl 576x4 63.3

Please note, the containerized run yield the same result as the native run.