Create x86

Configure X86 Cluster

We will reuse the guided and refined configuration we just created to spin up a X86 cluster configuration.

read -p "Name of the cluster you want to create: " CLUSTER_NAME

Let us commit this to our environment variables and start refine the config.

echo "export CLUSTER_NAME=${CLUSTER_NAME}"| tee -a ~/.bashrc
cp pcluster-base.ini ${CLUSTER_NAME}.ini

Adjust Configuration

Head Node

We’ll pick a M5.2xlarge instance as a head node. The config file need to change here:

[cluster default]
master_instance_type = m5.2xlarge

We’ll use wildq to overwrite the current setting.

cat ${CLUSTER_NAME}.ini \
    |wildq -i ini -M '."cluster default".master_instance_type = "m5.2xlarge"' \
    |sponge ${CLUSTER_NAME}.ini

IAM Policy

To push and pull to the Elastic Container Registry (ECR), we add an additional IAM role to ParallelCluster.

[cluster default]
additional_iam_policies = arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryFullAccess

Let’s create the key in the config file.

cat ${CLUSTER_NAME}.ini \
    |wildq -i ini -M '."cluster default".additional_iam_policies = "arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryFullAccess"' \
    |sponge ${CLUSTER_NAME}.ini

Instance Types

CPU

As compute fleet we’ll pick c5n.18xlarge to be able to run MPI application using EFA. We need to add the following config:

[cluster default]
queue_settings = cpu

[queue cpu]
disable_hyperthreading = false
enable_efa = true
compute_resource_settings = c5n

[compute_resource c5n]
instance_type = c5n.18xlarge
max_count = 3

wildq helps us with that:

cat ${CLUSTER_NAME}.ini \
    |wildq -i ini -M '."cluster default".queue_settings = "cpu"' \
    |wildq -i ini -M '."queue cpu".disable_hyperthreading = "false"' \
    |wildq -i ini -M '."queue cpu".enable_efa = "true"' \
    |wildq -i ini -M '."queue cpu".compute_resource_settings = "c5n"' \
    |wildq -i ini -M '."compute_resource c5n".instance_type = "c5n.18xlarge"' \
    |wildq -i ini -M '."compute_resource c5n".max_count = "3"' \
    |sponge ${CLUSTER_NAME}.ini

GPU

We’ll add g4dn instance sizes, depending on your account you might not be able to start bigger instances.

If you are attending an AWS led event, please ask your instructor if you are able to run G4dn.8xlarge instances.

G4dn.2xlarge

G4dn.2xlarge are instances with 1 GPU and 8 vCPU. Everyone will be able to start those instances.

[cluster default]
queue_settings = cpu,gpu

[queue gpu]
disable_hyperthreading = false
enable_efa_gdr = false
compute_resource_settings = g4dn2xl

[compute_resource g4dn2xl]
instance_type = g4dn.2xlarge
max_count = 1

Let’s add that to the config:

cat ${CLUSTER_NAME}.ini \
    |wildq -i ini -M '."cluster default".queue_settings = "c5n,gpu"' \
    |wildq -i ini -M '."queue gpu".disable_hyperthreading = "false"' \
    |wildq -i ini -M '."queue gpu".enable_efa_gdr = "false"' \
    |wildq -i ini -M '."queue gpu".compute_resource_settings = "g4dn2xl"' \
    |wildq -i ini -M '."compute_resource g4dn2xl".instance_type = "g4dn.2xlarge"' \
    |wildq -i ini -M '."compute_resource g4dn2xl".max_count = "1"' \
    |sponge ${CLUSTER_NAME}.ini

Create

After the refinement we’ll kick of the creation. We will use the following flags:

  • -c <file> defines the configuration file we just created;
  • --nowait will kick off the deployment and detach;
  • --norollback won’t roll back in case there is an error. That is helpful for debugging your post-install scripts.
pcluster create -c ${CLUSTER_NAME}.ini ${CLUSTER_NAME} --nowait --norollback

We are using the following command to attach to the status:

pcluster status ${CLUSTER_NAME}

After a couple of minutes (~10min) you will see pcluster finish.