DASH - DHTS Azure School of Medicine High Performance Computing
DASH has a predefined compute cluster that is shared by multiple users across many groups. The cluster is composed of two types of compute nodes and a high performance disk option with provisioned IOPS that is attached to each node. The cluster is based on Slurm Workload Manager Version 20.11.7.
- Two Login Nodes (dash1-login-1 and dash1-login-2)
- One Slurm scheduler node
- A
execute
partition with 20 nodes:- 2 of the nodes dash1-exec-[1-2] are always on and 18 nodes dash1-exec-[3-16] are on autoscale
- A
highmem
partition with 5 nodes - all on autoscale dash1-highmem-[1-5] - 3 GPU nodes - all on autoscale dash1-gpu-[1-3]
- Login node of Virtual Machine Type
- 8 vCPUs
- 32.0 GB of RAM
- 1023.0 GB local disk
- Scheduler node of Virtual Machine type
- 4 vCPU
- 8 GB of RAM
- 32 GB of attached SSD Temp Storage
- 3,000 Mbps Network Bandwidth
execute
Partition Nodes of Virtual Machine type- 32 vCPU
- 256 GB of RAM (249036 MB actual)
- 1200 GB of attached SSD Temp Storage
- 16,000 Mbps Network Bandwidth
highmem
Partition Nodes of Virtual Machine type- 96 vCPU
- 672 GB of RAM (653721 MB actual)
- 3600 GB of attached SSD Temp Storage
- 35,000 Mbps Network Bandwidth
- GPU Nodes (NVIDIA A100)
- 24 vCPUs
- 220 GB of RAM
- 80 GB GPU RAM
- 1 GPU per node
- Active Storage 128 TiB Lustre Filesystem with non-RAIDed HDDs with backend to 5 PB Azure Blob container
- Up to 4 GB/s maximum throughput (~2 GB/s average)
Autoscale means the virtual machine that backs the node is only provisioned upon user request. It usually takes about 8 minutes for a node to come online. To efficiently use cluster resource, always request resource from the always-on nodes before the autoscale nodes.