Transferring Data
Files can be transferred two and from the cluster using Globus or command line tools.
Large Data Transfers
Using Globus is the most efficient method to bring large data in and out of the cluster. Transferring data to/from the login node is the next most performant option. Using a worker node is the slowest option when downloading large data.
Globus
Transferring data to/from the cluster’s /data directory is possible via Globus. Within Globus the cluster’s /data directory is represented as a Collection named Duke RCC Scratch Space. You can install Globus Personal Connect on your local machine to transfer files between the RCC and your local machine.
See
Globus: Large data transfer service for researchers | Office of Information Technology for general documentation about using Globus at Duke. If you use globus to transfer to your computer have any issues using Globus Personal Connect see Globus Connect Personal Troubleshooting Guide .
See Creating a Globus S3 Collection for documentation on setting up an AWS S3 collection in globus.
Command Line Tools
Files can be copied to and from your local machine using the scp and rsync commands.
These commands need to be run on your local machine and not from the login node.
scp
The scp command will copy files to and from the cluster when run on your local machine. This command will start over if interrupted mid transfer. Because of this when transferring large files using rsync is recommended.
Copy to Cluster
To copy a file from your local machine to a directory on the cluster run a command like so:
scp <Path> <NetID>@rcc-login:<RemotePath><Path>- source path to the file on your computer<RemotePath>- destination path on the cluster. This can be an absolute path or a path relative to your home directory on the cluster.
Copy from Cluster
To copy a file from your local machine to a directory on the cluster run a command like so:
scp <NetID>@rcc-login:<RemotePath> <Path><RemotePath>- source path to on the cluster. This can be an absolute path or a path relative to your home directory on the cluster.<Path>- destination path to the file on your computer
rsync
The rsync command will copy files to and from the cluster when run on your local machine. This command will resume skip already transferred start files so is a good option for transferring large files. To enable showing progress and handle directories rsync is run with the -rP flags.
Copy to Cluster
To copy a file from your local machine to a directory on the cluster run a command like so:
rsync -rP <Path> <NetID>@rcc-login:<RemotePath><Path>- source path to the file on your computer<RemotePath>- destination path on the cluster. This can be an absolute path or a path relative to your home directory on the cluster.
Copy from Cluster
To copy a file from your local machine to a directory on the cluster run a command like so:
rsync -rP <NetID>@rcc-login:<RemotePath> <Path><RemotePath>- source path to on the cluster. This can be an absolute path or a path relative to your home directory on the cluster.<Path>- destination path to the file on your computer
aws
An environment module is provided for the AWS CLI to transfer files from S3. For example activate the module and list a public s3 bucket like so:
module load aws
aws s3 ls --no-sign-request s3://tgs-opendata-poseidon/See AWS CLI Documentation for instructions on configuring and using the aws command line program.
Using Globus for larger transfers may be more performant.