Backing Up Data with rclone on SeaWulf

Backing Up Data with rclone

SeaWulf storage is not designed for long-term archival. Home directories have limited space, scratch files are automatically deleted after 30 days, and project spaces are not backed up. You should regularly backup important data to external storage.

Recommended: Stony Brook Box — Faculty and staff have access to Stony Brook Box, a secure cloud storage service with unlimited storage that supports high-risk and sensitive data. With Stony Brook Box, you can:

  • Access your account from anywhere, at any time, and on any device with an internet connection
  • Share documents and folders within the Stony Brook community and with external collaborators
  • Collaborate effectively by adding comments, assigning tasks, and co-creating documents

This guide shows you how to use rclone to transfer data from SeaWulf to Box or Google Drive.

Loading rclone

Before using rclone, load the module:

module load rclone/1.59.2

Note: rclone v1.54 is also available but does not work with Google Drive. Always use v1.59.2.

Configuring rclone for Box

The first time you use rclone, you'll need to configure it. Start the interactive setup:

rclone config

Follow these steps (your responses in bold):

No remotes found - make a new one
n) New remote
s) Set configuration password
q) Quit config
n/s/q> n

name> my_box

Type of storage to configure.
[... list of options ...]
Storage> 8  (for Box)

Box App Client Id
Leave blank normally.
Enter a string value. Press Enter for default ("").
client_id> [press Enter]

Box App Client Secret
Leave blank normally.
Enter a string value. Press Enter for default ("").
client_secret> [press Enter]

Box App config.json location
Leave blank normally.
Enter a string value. Press Enter for default ("").
config_json> [press Enter]

Access Token as a JSON blob.
Enter a string value. Press Enter for default ("").
access_token> [press Enter]

Edit advanced config?
y) Yes
n) No (default)
y/n> [press Enter]

Use auto config?
 * Say Y if not sure
 * Say N if you are working on a remote or headless machine
y) Yes (default)
n) No
y/n> n

At this point, you'll see instructions to authorize rclone on a machine with a web browser. On your local computer:

  1. Download rclone v1.59.2 from rclone.org/downloads
  2. Open a terminal and navigate to where you downloaded rclone
  3. Run: rclone authorize "box"
  4. A browser window will open: log in with your Stony Brook credentials
  5. Copy the token that appears in your terminal
  6. Paste it into your SeaWulf terminal

Complete the configuration:

y) Yes this is OK (default)
e) Edit this remote
d) Delete this remote
y/e/d> y

e) Edit existing remote
n) New remote
d) Delete remote
q) Quit config
e/n/d/q> q

Configuring rclone for Google Drive

If you prefer Google Drive (available with your Stony Brook email), follow similar steps but choose option 18 for Google Drive when selecting storage type:

rclone config
n/s/q> n
name> my_gdrive
Storage> 18  (for Google Drive)

[Leave client_id, client_secret, and scope blank]

Use auto config?
y/n> n

Then authorize on your local machine:

rclone authorize "drive"

Paste the token back into SeaWulf and complete the setup as with Box.

Backing Up Files

Before backing up, create a folder in Box or Google Drive through the web interface to organize your backups. For example, create a folder called "seawulf_backup".

Copy a single file:

rclone copy ./myfile.txt my_box:seawulf_backup

Copy a directory and its contents:

rclone copy ./mydir/ my_box:seawulf_backup/mydir

Sync a directory (deletes files in destination that aren't in source):

rclone sync ./mydir/ my_box:seawulf_backup/mydir

Performance Tips

You can adjust transfer settings to improve performance:

rclone copy ./mydir/ my_box:seawulf_backup/mydir --transfers=8 --drive-chunk-size=16384

Performance notes:

  • Single file transfers typically achieve 350-450 Mbps
  • Both Box and Google Drive limit simultaneous file transfers
  • Directories with many small files transfer slowly due to API limits

For directories with many small files: Create a compressed archive first:

tar -zcvf mydir.tar.gz ./mydir
rclone copy mydir.tar.gz my_box:seawulf_backup

Useful rclone Commands

# List files in your remote storage
rclone ls my_box:seawulf_backup

# Check size of remote directory
rclone size my_box:seawulf_backup

# Compare local and remote (dry run)
rclone check ./mydir/ my_box:seawulf_backup/mydir

# Copy only files that don't exist in destination
rclone copy ./mydir/ my_box:seawulf_backup/mydir --ignore-existing

Tip: Sample rclone scripts with additional options are available in /gpfs/projects/samples/rclone

Important: Cloud storage providers may have rate limits. If you encounter errors about too many requests, add the --tpslimit flag to slow down transfers.