Using Cromwell/WDL

How to run a WDL workflow on the cluster using the cromwell workflow management system.

Step-by-step guide

Download cromwell: cromwell-<number>.jar
Create cromwell.conf to enable SLURM and singularity
Create options.json to specify your slurm account and output directory
Create a script to run cromwell for your WDL workflow using the two config files
Run cromwell in an interactive job
Run cromwell via sbatch

1. Download cromwell

DASH currently has java already installed on all nodes so the only thing required to run cromwell is to download a cromwell jar.

To download the latest version of cromwell visit https://github.com/broadinstitute/cromwell/releases and copy the URL for the latests cromwell-<number>.jar file.

For this guide we will be using https://github.com/broadinstitute/cromwell/releases/download/85/cromwell-85.jar.

On a DASH login node create a directory for your project and change into that directory.

Then run the following command to download cromwell:

Download cromwell

wget https://github.com/broadinstitute/cromwell/releases/download/85/cromwell-85.jar

2. Configure cromwell to use SLURM and singularity

Create a configuration file named cromwell.conf with the following content to run the jobs using sbatch and optionally singularity.

cromwell.conf

include required(classpath("application"))

backend {
  default = SLURM
  providers {
    SLURM {
      actor-factory = "cromwell.backend.impl.sfs.config.ConfigBackendLifecycleActorFactory"
config {
        runtime-attributes = """
        String slurm_account = ""
        Int runtime_minutes = 600
        Int cpu = 1
        Int memory_mb = 8000
        String? docker
        """

        submit = """
            sbatch \
               -A ${slurm_account} \
               -J ${job_name} \
               -D ${cwd} \
               -o ${out} \
               -e ${err} \
               -t ${runtime_minutes} \
               ${"-c " + cpu} \
               --mem ${memory_mb} \
               --wrap "/bin/bash ${script}"
        """

        kill = "scancel ${job_id}"

        check-alive = "squeue -j ${job_id}"

        job-id-regex = "Submitted batch job (\\d+).*"

        submit-docker = """
            sbatch \
               -A ${slurm_account} \
               -J ${job_name} \
               -D ${cwd} \
               -o ${out} \
               -e ${err} \
               -t ${runtime_minutes} \
               ${"-c " + cpu} \
               --mem ${memory_mb} \
               --wrap "singularity exec --containall --bind ${cwd}:${docker_cwd} docker://${docker} ${job_shell} ${docker_script}"
        """

      }
    }
  }
}

3. Configure cromwell to use your slurm account and output directory

By default cromwell saves outputs into a unique generated directory. This can make it challenging to locate our output files. To avoid this there are options to configure a directory for cromwell to store your output files into. Additional we need to specify a Slurm account or the sbatch commands issued by cromwell will fail.

Create a file named options.json with the following content replacing results with your preferred output directory and TODO with your slurm account name.

options.json

{
    "final_workflow_outputs_dir": "results",
    "use_relative_output_paths": true,
    "default_runtime_attributes": {
        "slurm_account": "TODO"
    }
}

4. Create a script to run cromwell

Create a script named run-workflow.sh with the following content:

run-workflow.sh

#!/bin/bash

java -Dconfig.file=cromwell.conf -jar cromwell-85.jar run -o options.json "$@"

This scripts expects the workflow name to be passed as a command line argument.

5. Run a WDL workflow with cromwell in an interactive job

When debugging a WDL workflow or adjusting cromwell configuration it is often helpful to run cromwell in the foreground via an interactive job.

Run the following replacing TODO with your slurm account.

Start an interactive job

srun --pty -A TODO bash

If you do not already have a WDL workflow you can create one named hello.wdl with the following contents:

hello.wdl

version 1.0
workflow myWorkflow {
    call myTask
}

task myTask {
    input {
        String output_filename = "result.txt"
        String num_cpu = 2
        Int mem_size_gb = 4
        String docker_image = "rockylinux:9.1.20230215-minimal"
    }
    command <<<
        cat /etc/os-release > ~{output_filename}
    >>>
    output {
        File result = output_filename
    }
  runtime {
    memory: "~{mem_size_gb} GiB"
    cpu: num_cpu
    docker: docker_image
  }
}

The above workflow is meant to serve as a test of the cromwell singularity and SLURM integration. It uses a singularity container and copies some data to an output file. Additionally it requests cpu and memory requirements that will be passed to the sbatch jobs run by cromwell.

Now you can run the workflow like so changing hello.wdl to a different name if you have your own WDL workflow file:

bash run-workflow.sh hello.wdl

Next you can see the output files that are stored in the output directory configured in options.json:

cat results/result.txt

5. Run a WDL workflow with cromwell with sbatch

You can run cromwell with sbatch as follows changing TODO to your slurm account and optionally changing hello.wdl to your own WDL workflow file.

sbatch -A TODO run-workflow.sh hello.wdl

DHTS Azure HPC