Best Practice Considerations for HPC in the Cloud

  • It is very important to make sure you understand the basics of the pipeline packages you plan to use and Dev/Ops - like WDL and Docker - and how the inputs and outputs work.
  • Understand the exact statistical methods underlying the workflow in all pre-developed workflows. 
  • Modifying and adapting a workflow for one environment (local) to another (cloud) takes time- make sure to allow for that time in your planning.
  • Local file paths do not always behave the same as cloud URLs.
  • Develop and test workflows with small datasets on local compute environments first; e.g., HARDAC.
  • Then test full analysis workflow in the cloud using a single chromosome (or similarly small representation) to fine-tune and minimize cloud costs.
  • THEN fully run the workflow.