Using Jupyter Notebook

Running a Jupyter Notebook on the cluster is slightly more complicated than on your own computer. You want to make sure you're running the server on a compute node and not on the low performance Scheduler node and Compute nodes in the cluster are only reachable through interactive sessions (not through ssh). Below is an example of running a Jupyter Notebook on a DASH Execute node.    

Run a Jupyter Notebook Instance

  1. Open a shell prompt on your local computer and login to the SoM HPC scheduler node, replacing with your Duke NetID.

    ssh <NETID>@dash.duhs.duke.edu
  2. Activate your conda or venv environment with desired version of Jupyter Notebook in it.

    conda activate <my-conda-env-name>
  3. Submit a batch job using the commonJupyter Notebook slurm job template. The default parameters for this job are in the job template but you can alter the sbatch command depending on your needs. Default parameters are:

    • --time=08:00:00  Terminates the notebook instance after 8 hours
    • --ntasks=1   Defines the notebook as a single task

    • --cpus-per-task=2  Provides 2 CPUs for the notebook

    • --mem=8192  Provides 8192 MB of memory for the notebook

      sbatch /data/shared/jobs/jupyter-notebook.job
  4. You will see a slurm output showing you the job id.

    Submitted batch job XXXXXX
  5. In your home directory there will be a new file called jupyter-notebook.job.XXXXXX where XXXXXX is the job id. To display the connection instructions for this instance of the Jupyter Notebook replace XXXXXX with the value provided to you in Step 3.

    cat jupyter-notebook.job.XXXXXX
  6. Follow the connection instructions which look like the following.

    # NOTE: THIS IS AN EXAMPLE, you must follow the instructions in the jupyter-notebook.job.XXXXXX in your home directory 
    
    1. Start an interactive sesion on the cluster node where Jupyter Notebook is running (Default duration is 8 hours, feel free to modify if needed)
    
       srun --mem 500 --time=8:00:00 -p execute -w <NODENAME> --pty bash
    
    2. SSH tunnel from the node back to the scheduler node using the internal URL. 
       NOTE: After entering your password, the process will continue with no additional output to the terminal.
       (VERY IMPORTANT!!)
    
       ssh -NR <PORT>:localhost:<PORT> <NETID>@somhpc-tunnel.azure.dhe.duke.edu
    
    3. From your local workstation, SSH tunnel to the scheduler node using the public URL. 
       NOTE: After entering your password, the process will continue with no additional output to the terminal.
       (VERY IMPORTANT!!) 
    
       ssh -NL <PORT>:localhost:<PORT> <NETID>@somhpc-tunnel.azure.dhe.duke.edu
    
       and point your web browser to http://localhost:<PORT>
    
    4. log in to Jupyter Notebook using the following credentials:
    
       password: <PASSWORD>
    
    When done using Jupyter Notebook, terminate the job by:
    
    1. Choose Logout on your Jupyter Notebook session in the browser
    2. Type Ctrl-C to close the SSH tunnel on the interactive session
    3. Type "exit" to end the interactive session
    4. Issue the following command on the scheduler node:
    
          scancel -f <JOBID>
    
    5. On your local workstation, type Ctrl-C to close the SSH tunnel to the scheduler node
    
    # Output from Jupyter Server
    --------------------------------------------------------------------------------
  7.  If you wish to set up Jupyter notebook using a different kernel, please refer to this document for additional configuration.