DASH - DHTS Azure School of Medicine High Performance Computing

DASH has a predefined compute cluster that is shared by multiple users across many groups. The cluster is composed of two types of compute nodes and a high performance disk option with provisioned IOPS that is attached to each node. The cluster is based on Slurm Workload Manager Version 20.11.7

  • Two Login Nodes (dash1-login-1 and dash1-login-2)
  • One Slurm scheduler node
  • A execute partition with 20 nodes:
    •  2 of the nodes dash1-exec-[1-2] are always on and 18 nodes dash1-exec-[3-16] are on autoscale
  • A highmem partition with 5 nodes - all on autoscale dash1-highmem-[1-5]
  • 3 GPU nodes - all on autoscale dash1-gpu-[1-3]
  • Login node of Virtual Machine Type 
    • 8 vCPUs
    • 32.0 GB of RAM
    • 1023.0 GB local disk
  • Scheduler node of Virtual Machine type 
    • 4 vCPU
    • 8 GB of RAM
    • 32 GB of attached SSD Temp Storage
    • 3,000 Mbps Network Bandwidth
  • execute Partition Nodes of Virtual Machine type
    • 32 vCPU
    • 256 GB of RAM (249036 MB actual)
    • 1200 GB of attached SSD Temp Storage
    • 16,000 Mbps Network Bandwidth
  • highmem Partition Nodes of Virtual Machine type
    • 96 vCPU
    • 672 GB of RAM (653721 MB actual)
    • 3600 GB of attached SSD Temp Storage
    • 35,000 Mbps Network Bandwidth
  • GPU Nodes (NVIDIA A100) 
    • 24 vCPUs
    • 220 GB of RAM
    • 80 GB GPU RAM
    • 1 GPU per node
  • Active Storage 128 TiB Lustre Filesystem with non-RAIDed HDDs with backend to 5 PB Azure Blob container 
    • Up to 4 GB/s maximum throughput (~2 GB/s average)

Autoscale means the virtual machine that backs the node is only provisioned upon user request. It usually takes about 8 minutes for a node to come online. To efficiently use cluster resource, always request resource from the always-on nodes before the autoscale nodes.