Using the DHTS Azure HPC Cluster

The DHTS Azure HPC service allows for creation of many clusters, most of which will follow the operating system and software model of the DASH.

Here are several useful topics to help you learn the basics of using this system:   

Slurm Job Scheduler


The cluster uses Slurm as its job scheduler. Slurm is an open source job scheduler used in many supercomputers across the world. 

See the Using Slurm pagefor detailed examples on how to submit jobs to Slurm on the cluster. 

Below are some of the common Slurm commands to get you started:

EXAMPLE:

# View information about Slurm nodes and partition
user@ip-0A260C0B:~$ sinfo
PARTITION AVAIL  TIMELIMIT  NODES  STATE 	NODELIST 
exec*        up   infinite      8  idle~ 	dash1-exec-[3-10]
exec*        up   infinite      2  idle 	dash1-exec-[1-2]
gpu          up   infinite      4  idle~ 	dash1-gpu-[1-3,5]
gpu          up   infinite      1  mix 		dash1-gpu-4
highmem      up   infinite      3  idle~ 	dash1-highmem-[1-3]  

# Request an interactive session with 500 MB of memory
# Compute nodes can only be accessed interactively via Slurm. You cannot SSH to compute nodes.
$ srun -p exec --mem 500 --pty bash
$ hostname
dash1-exec-1

# Request an interactive session on a specific node on the hpc partition
$ srun -p exec -w dash1-exec-1 --pty bash

# Example sbatch job for script sequence.sh 
# Requesting one node (-N1) and one task (-n1) with an account(-A) and be notified of job status via email
$ sbatch -N1 -n1 -A <account_name> --mail-user=<user_email> --mail-type=ALL sequence.sh

# sbatch is recommended for non-interactive sessions.

Get data onto the cluster


Before running a pipeline on the cluster, confirm that your genomic data is placed under /data/<your_lab>, please see the Getting Data onto DASH for data migration detail.