Quickstart Guide

To get started using DASH, here is a curated list of things to begin with. Follow the title links for more in depth articles on each topic.

Request DASH access by asking your PI to add you as a member of a grouper policy group. If no grouper group is found, follow instruction here
You must be on the Duke University network, Duke Health network or on the DHE VPN, if remote.
Connect to DASH via ssh. This step requires you to have established connection with DHE VPN.
```
ssh <NetID>@dash.duhs.duke.edu
```
Test run Slurm command on the cluster
- Print "Hello World!" in the shell by submitting a srun command:
```
srun --account=<GroupName> -p exec -N1 --mem=2G echo 'Hello World!'
```
  --account: always run your Slurm command by specifying your group name using --account flag. For users who belong to multiple groups, one way to determine which group name to use is to think about what data is being used for the job. For example, if the data is pulled from /data/reddylab, then the group account name is reddylab. If the data is pulled from /data/dcibioinformatics then the group account name is dcibioinformatics. By attaching a group name to the job, we can identify resource usage on the cluster via sacct.
  
  -p: specify a partition
  -N1: one node is requested to run the command
  --mem=2G: 2GB of memory is requested to execute the command
- Run an interactive session with bash shell. For interactive session, requests nodes from dash1-exec-[1-2]
```
srun --account=<GroupAccountName> -p exec --mem=1G --pty /bin/bash
```
Getting Data onto DASH
Using Conda
- BEFORE you use conda for the first time, you need to type the following: $/sched/anaconda3/bin/conda init bash
- You can create any number of conda environments.
- It is best practice to create different environments and isolate tools - especially R tools; e.g., Seurat, to ensure quality and veracity of the tools.
- To create different places/environments for conda, type:

$conda create --prefix /data/lab_account/conda_envs/tool name

Examples:

$conda create --prefix /data/twestlab/conda_envs/cellranger

$conda create --prefix /data/twestlab/conda_envs/seurat

Exporting conda environment on HARDAC

# 1. It requires conda v4.7.12 or later
$ conda -V
conda 4.12.0

# 2. Self-install minicoda to get the latest conda version if needed
# wget miniconda linux installer
$ wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh

# Verify SHA256 checksums of downloaded script
$ sha256sum filename

# Run the installer script
$ bash Miniconda3-latest-Linux-x86_64.sh

# 3. Determine which conda environment to export on HARDAC
$ conda env list
# conda environments:
#
jimmy                    /home/mh584/.conda/envs/jimmy
jimmy_test_env           /home/mh584/.conda/envs/jimmy_test_env
base                  *  /home/mh584/miniconda3
reddy                    /home/mh584/miniconda3/envs/reddy

# 4. Export conda environment
$ conda env export --from-history --prefix /home/mh584/miniconda3/envs/reddy > reddy_export.yml

Using Singularity
- Singularity containers provide an excellent way to manage dependencies and make your research reproducible.
- There are many off-the-shelf containers available for common bioinformatics tools
- We strongly recommend you learn and use this powerful tool over managing local environments
Using Jupyter Notebook
- Jupyter Notebooks allow you to program in python through a web interface while running compute on the cluster's nodes
Using RStudio Server
- RStudio Server is an Integrated Development Environment (IDE) for the R language.
- RStudio allows you to program in R through a web interface while running compute on the cluster's nodes
Shared datasets
- Reusable data that is common to many users and does not need to be replicated on each account's own storage
Getting help by submitting a ticket via the DukeHealth portal: https://duke.service-now.com/sp?id=fix_it&sys_id=3f1dd0320a0a0b99000a53f7604a2ef9 and assign it to Systems-HPC-DHTS.