DASH - DHTS Azure School of Medicine High Performance Computing

DASH has a predefined compute cluster that is shared by multiple users across many groups. The cluster is composed of two types of compute nodes and a high performance disk option with provisioned IOPS that is attached to each node. The cluster is based on Slurm Workload Manager Version 20.11.7.

Two Login Nodes (dash1-login-1 and dash1-login-2)
One Slurm scheduler node
A execute partition with 20 nodes:
- 2 of the nodes dash1-exec-[1-2] are always on and 18 nodes dash1-exec-[3-16] are on autoscale
A highmem partition with 5 nodes - all on autoscale dash1-highmem-[1-5]
3 GPU nodes - all on autoscale dash1-gpu-[1-3]
Login node of Virtual Machine Type
- 8 vCPUs
- 32.0 GB of RAM
- 1023.0 GB local disk

Scheduler node of Virtual Machine type
- 4 vCPU
- 8 GB of RAM
- 32 GB of attached SSD Temp Storage
- 3,000 Mbps Network Bandwidth
execute Partition Nodes of Virtual Machine type
- 32 vCPU
- 256 GB of RAM (249036 MB actual)
- 1200 GB of attached SSD Temp Storage
- 16,000 Mbps Network Bandwidth
highmem Partition Nodes of Virtual Machine type
- 96 vCPU
- 672 GB of RAM (653721 MB actual)
- 3600 GB of attached SSD Temp Storage
- 35,000 Mbps Network Bandwidth
GPU Nodes (NVIDIA A100)
- 24 vCPUs
- 220 GB of RAM
- 80 GB GPU RAM
- 1 GPU per node
Active Storage 128 TiB Lustre Filesystem with non-RAIDed HDDs with backend to 5 PB Azure Blob container
- Up to 4 GB/s maximum throughput (~2 GB/s average)

Autoscale means the virtual machine that backs the node is only provisioned upon user request. It usually takes about 8 minutes for a node to come online. To efficiently use cluster resource, always request resource from the always-on nodes before the autoscale nodes.