Creating Intel Select Solution HPC clusters

You can use the Slurm-GCP workload manager to create clusters that are based on the HPC virtual machine (VM) image and comply to the Intel Select Solution for Simulation and Modeling criteria.

When you create Intel Select Solution verified HPC environments on Google Cloud, your environment satisfies the following conditions:

  • Optimized for HPC workloads.
  • Satisfies the software, system, and solution performance standards required for Intel Select Solution verification.
  • Verified compatible with applications listed in the Intel HPC application catalog.

Create Intel Select Solution verified clusters using Slurm-GCP

  1. Clone the Slurm-GCP Github repository Intel Select branch, which is currently in preview.

    git clone --branch intel-select https://github.com/schedmd/slurm-gcp
    
  2. Create Slurm-GCP images for the cluster. The Slurm-GCP images are derived from the HPC VM image.

    cd foundry
    python3 foundry.py --intel_image
    

    This command creates a compute node image, which takes approximately 1 minute, and a controller node image, which takes approximately 7 minutes.

  3. Change your directory to tf/examples/basic/ and create a basic.tfvars file from basic.tfvars.example file. Set the intel_select_solution option in the Terraform configuration file to full_config or software_only.

    • full_config: This option checks the machine type and controller boot size configuration. If you use this option, set the compute node's machine_type to c2-standard-60 and the controller node's controller_disk_size_gb to at least 215 GB.
    • software_only: This option checks only for software requirements. It does not check for the machine type or controller boot size.
  4. Run the Slurm-GCP Terraform scripts as follows to deploy the cluster. The -var-file=basic.tfvars flag instructs Terraform to check that the cluster configuration meets the requirements of the Intel HPC Platform Specification and the Intel Select Solution for Simulation and Modeling.

    terraform init
    terraform apply -var-file=basic.tfvars
    

Verify compliance using the Intel Cluster Checker

  1. SSH to the login node.

  2. Load the environment configuration by adding the following to the .bashrc file of your login node.

    export PATH=/apps/intelpython3/bin/:/sbin:/bin:/usr/sbin:/usr/bin:$PATH
    source /apps/clck/2019.10/bin/clckvars.sh
    source /apps/psxe_runtime/linux/bin/psxevars.sh
    
  3. Enable SSH that does not require a password.

    ssh-keygen
    cat .ssh/id_rsa.pub >> .ssh/authorized_keys
    chmod 644 .ssh/authorized_keys
    
  4. Run the Intel Cluster Checker to verify that the output contains Validation PASS. For more information, see the official Intel Cluster Checker documentation.

    salloc -N $num_of_node_to_check
    clck -F intel_hpc_platform_compat-hpc-2018.0
    

Clean up

To avoid incurring charges for the VM image you created, delete the image by running the following command.

gcloud compute images delete schedmd-slurm-hpc-intel-compute \
             schedmd-slurm-hpc-intel-controller

What's next