Using the DHTS Azure HPC Cluster
The DHTS Azure HPC service allows for creation of many clusters, most of which will follow the operating system and software model of the DASH.
Here are several useful topics to help you learn the basics of using this system:
- Policy Group/Account Setup
- Connect to DASH Cluster
- Storage on DASH
- Getting Data onto DASH
- Shared Datasets
- To Copy Data to an AWS S3 Bucket
- Using Slurm
- Cloning Github and Gitlab Repository
- Conda Environments or Singularity Containers?
- Using Singularity
- Using Conda
- Migrating cross-platform conda environment
- Using Bowtie
- Using Jupyter Notebook
- Using RStudio Server
- Using VSCode
- Additional Training Resources
Slurm Job Scheduler
The cluster uses Slurm as its job scheduler. Slurm is an open source job scheduler used in many supercomputers across the world.
See the Using Slurm pagefor detailed examples on how to submit jobs to Slurm on the cluster.
Below are some of the common Slurm commands to get you started:
EXAMPLE: # View information about Slurm nodes and partition user@ip-0A260C0B:~$ sinfo PARTITION AVAIL TIMELIMIT NODES STATE NODELIST exec* up infinite 8 idle~ dash1-exec-[3-10] exec* up infinite 2 idle dash1-exec-[1-2] gpu up infinite 4 idle~ dash1-gpu-[1-3,5] gpu up infinite 1 mix dash1-gpu-4 highmem up infinite 3 idle~ dash1-highmem-[1-3] # Request an interactive session with 500 MB of memory # Compute nodes can only be accessed interactively via Slurm. You cannot SSH to compute nodes. $ srun -p exec --mem 500 --pty bash $ hostname dash1-exec-1 # Request an interactive session on a specific node on the hpc partition $ srun -p exec -w dash1-exec-1 --pty bash # Example sbatch job for script sequence.sh # Requesting one node (-N1) and one task (-n1) with an account(-A) and be notified of job status via email $ sbatch -N1 -n1 -A <account_name> --mail-user=<user_email> --mail-type=ALL sequence.sh # sbatch is recommended for non-interactive sessions.
Get data onto the cluster
Before running a pipeline on the cluster, confirm that your genomic data is placed under /data/<your_lab>
, please see the Getting Data onto DASH for data migration detail.