Using Bowtie

One way to run bowtie on an Azure HPC cluster is to first create a slurm file "bowtie.slurm" where you can edit fastq and reference genome file names as needed. In more advanced implementations, the files can be specified in the user environment or replaced on the fly via "sed":

bowtie.slurm

#!/bin/bash
#SBATCH -o bowtie.slurm.%j.%N.out
#SBATCH -e bowtie.slurm.%j.%N.err
#SBATCH -D /home/ter18/example_scripts
#SBATCH -J bowtie.slurm
#SBATCH -c 8
#SBATCH --get-user-env
#SBATCH --time=12:00:00
#SBATCH --exclusive
#
READ1_FASTQ=/path/to/file.fastq

# leave empty for single-end alignment
READ2_FASTQ=

REFERENCE_GENOME=/data/common/bowtie_genomes/hg19.ebwt/hg19
OUTPUT=`echo $READ1_FASTQ | sed -e s/fastq/bowtie/`
srun bowtie_align.sh $READ1_FASTQ $READ2_FASTQ $REFERENCE_GENOME $OUTPUT 8

Then, there is a bash script that actually does the work:

bowtie_align.sh

#!/bin/bash
BOWTIE=`which bowtie`
#
# Paramters:
# $1 - READ1 fastq file
# $2 - READ2 fastq file (optional)
# $3 - Bowtie reference genome
# $4 - Number of processors
#
# Note:
# 4 parameters implies single-end alignment
# 5 parameters implies paired-end alignment
#
if [ -z $5 ]
then
  $BOWTIE -p $4 -t --chunkmbs 512 --best $2 $1 $3
else
  $BOWTIE -p $5 -t --chunkmbs 512 --best -X 2000 $3 -1 $1 -2 $2 $4
fi

Finally to run the command, submit the job file to the slurm queue:

$ sbatch -N 1 ./bowtie.slurm