How do I backup my SeaWulf files (using rclone)?

Using rclone to backup data

If you have a Stony Brook email address, this also gives you access to a Google Drive account with virtually unlimited storage (the only restriction is that no single file can exceed 5 TB).  This article will explain how to backup your data on SeaWulf to Google Drive using rclone.

Setting Up Rclone

In order to backup your data onto Google Drive using rclone, first load the rclone module:

module load rclone/1.59.2

Note: rclone v1.54 is also available on SeaWulf, but it does not work with Google Drive. Please be sure to use v1.59.2!

When using rclone for the first time, you will need to go through a one-time configuration process.  The following steps (modified from the rclone remote setup documentation) will guide you through this process.  Note that we are using the configuration name "my_backup" in this guide, but you may choose whatever name you wish.  

In the shell, type the following to bring up the interactive configuration process:

rclone config

Next, go through each step in the setup process, as indicated by the bolded answers to each question below.

No remotes found - make a new one 
n) New remote 
s) Set configuration password 
q) Quit config
n/s/q> n
name> my_backup

Type of storage to configure.
Choose a number from below, or type in your own value.
 1 / 1Fichier
   \ (fichier)
 2 / Akamai NetStorage
   \ (netstorage)
 3 / Alias for an existing remote
   \ (alias)
 4 / Amazon Drive
   \ (amazon cloud drive)
 5 / Amazon S3 Compliant Storage Providers including AWS, Alibaba, Ceph, China Mobile, Cloudflare, ArvanCloud, Digital Ocean, Dreamhost, Huawei OBS, IBM COS, IDrive e2, Lyve Cloud, Minio, Netease, RackCorp, Scaleway, SeaweedFS, StackPath, Storj, Tencent COS and Wasabi
   \ (s3)
 6 / Backblaze B2
   \ (b2)
 7 / Better checksums for other remotes
   \ (hasher)
 8 / Box
   \ (box)
 9 / Cache a remote
   \ (cache)
10 / Citrix Sharefile
   \ (sharefile)
11 / Combine several remotes into one
   \ (combine)
12 / Compress a remote
   \ (compress)
13 / Dropbox
   \ (dropbox)
14 / Encrypt/Decrypt a remote
   \ (crypt)
15 / Enterprise File Fabric
   \ (filefabric)
16 / FTP
   \ (ftp)
17 / Google Cloud Storage (this is not Google Drive)
   \ (google cloud storage)
18 / Google Drive
   \ (drive)
19 / Google Photos
   \ (google photos)
20 / HTTP
   \ (http)
21 / Hadoop distributed file system
   \ (hdfs)
22 / HiDrive
   \ (hidrive)
23 / Hubic
   \ (hubic)
24 / In memory object storage system.
   \ (memory)
25 / Internet Archive
   \ (internetarchive)
26 / Jottacloud
   \ (jottacloud)
27 / Koofr, Digi Storage and other Koofr-compatible storage providers
   \ (koofr)
28 / Local Disk
   \ (local)
29 / Mail.ru Cloud
   \ (mailru)
30 / Mega
   \ (mega)
31 / Microsoft Azure Blob Storage
   \ (azureblob)
32 / Microsoft OneDrive
   \ (onedrive)
33 / OpenDrive
   \ (opendrive)
34 / OpenStack Swift (Rackspace Cloud Files, Memset Memstore, OVH)
   \ (swift)
35 / Pcloud
   \ (pcloud)
36 / Put.io
   \ (putio)
37 / QingCloud Object Storage
   \ (qingstor)
38 / SSH/SFTP
   \ (sftp)
39 / Sia Decentralized Cloud
   \ (sia)
40 / Storj Decentralized Cloud Storage
   \ (storj)
41 / Sugarsync
   \ (sugarsync)
42 / Transparently chunk/split large files
   \ (chunker)
43 / Union merges the contents of several upstream fs
   \ (union)
44 / Uptobox
   \ (uptobox)
45 / WebDAV
   \ (webdav)
46 / Yandex Disk
   \ (yandex)
47 / Zoho
   \ (zoho)
48 / premiumize.me
   \ (premiumizeme)
49 / seafile
   \ (seafile)
Storage> 18

Google Application Client Id
Setting your own is recommended.
See https://rclone.org/drive/#making-your-own-client-id for how to create your own.
If you leave this blank, it will use an internal key which is low performance.
Enter a string value. Press Enter to leave empty.
client_id> <leave blank and hit enter>
OAuth Client Secret
Leave blank normally.
Enter a string value. Press Enter to leave empty.
client_secret> <leave blank and hit enter>
Scope that rclone should use when requesting access from drive.
Choose a number from below, or type in your own value.
Press Enter to leave empty.
 1 / Full access all files, excluding Application Data Folder.
   \ (drive)
 2 / Read-only access to file metadata and file contents.
   \ (drive.readonly)
   / Access to files created by rclone only.
 3 | These are visible in the drive website.
   | File authorization is revoked when the user deauthorizes the app.
   \ (drive.file)
   / Allows read and write access to the Application Data folder.
 4 | This is not visible in the drive website.
   \ (drive.appfolder)
   / Allows read-only access to file metadata but
 5 | does not allow any access to read or download file content.
   \ (drive.metadata.readonly)
scope> <leave blank and hit enter, unless wish to specify access scope>
Service Account Credentials JSON file path.
Leave blank normally.
Needed only if you want use SA instead of interactive login.
Leading `~` will be expanded in the file name as will environment variables such as `${RCLONE_CONFIG_DIR}`.
Enter a value. Press Enter to leave empty.
service_account_file> <leave blank and hit enter>
Edit advanced config? (y/n)
y) Yes
n) No (default)
y/n> <leave blank and hit enter>
Remote config
Use auto config?
 * Say Y if not sure
 * Say N if you are working on a remote or headless machine or Y didn't work
y) Yes
n) No
y/n> n

For this to work, you will need rclone available on a machine that has
a web browser available.
For more help and alternate methods see: https://rclone.org/remote_setup/
Execute the following on the machine with the web browser (same rclone
version recommended):
        rclone authorize "drive"
Then paste the result.
Enter a value.

At this point, you will need to switch over to a machine that has a web browser. Please download rclone v1.59.2 from here onto your local machine, and then navigate to the folder it is in and run:

rclone authorize "drive"

A browser window will open prompting you to log into your Google account. Once you do this, copy and paste the token that appears in your local shell into your shell on SeaWulf.

From here, continue following the interactive process to complete the configuration:

Configure this as a team drive? 
y) Yes 
n) No (default) 
y/n> <leave blank and hit enter>

[my_backup]
client_id =
client_secret =
token = {"access_token":"xxxx.x.xxxxx_xxxxxxxxxxx_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx","RefreshToken":"1/xxxxxxxxxxxxxxxx_xxxxxxxxxxxxxxxxxxxxxxxxxx","token_type":"Bearer","expiry":"2017-07-12T16:46:29.381523567-04:00"}
--------------------
y) Yes this is OK
e) Edit this remote
d) Delete this remote
y/e/d> y
Current remotes:

Name                 Type
====                 ====
my_backup            drive

e) Edit existing remote
n) New remote
d) Delete remote
r) Rename remote
c) Copy remote
s) Set configuration password
q) Quit config
e/n/d/r/c/s/q> q

Now that the configuration process is complete, you are almost ready to back up your data.  Before you do, however, you should go back to your browser, navigate to Google Drive, and create a folder to store your backed up data.  For the purposes of this guide, we will use a folder called "seawulf_backup".  


Backing Up Data

Next, navigate to the directory on SeaWulf that contains the files and/or folders that you would like to backup.  To copy a single file to your Google Drive, type the following in the shell:

rclone copy ./myfile.txt my_backup:seawulf_backup

To copy a directory and all of its contents to Google Drive, using the following:

rclone copy ./mydir/ my_backup:seawulf_backup/mydir

Note that there are several optional rclone arguments that you can set.  Two important options include:

--transfers=N (default N=4)
--drive-chunk-size=SIZE (default SIZE=8192)

Increasing the values for these settings may increase transfer rates.  

Although the speed at which rclone is able to copy data to Google Drive is dependent on a variety of factors (including settings used, available bandwidth, etc.), our benchmarks suggest that you may see single file transfer speeds around 350-450 megabits per second.  

However, Google limits the number of files that can be simultaneously transferred.  Thus, if you wish to backup a directory with a large number of small files, the transfer rate may be much slower.  Because of this, it may be useful to create a compressed tarball archive file of any directories with a large number of files prior to using rclone.  To do this, type the following in the shell:

tar -zcvf mydir.tar.gz ./mydir

This compressed archive file can then be copied to Google Drive with rclone as before.

Some sample rclone scripts with additional options can also be found in the following SeaWulf directory:

/gpfs/projects/samples/rclone
Article Topic

 

Still Need Help? The best way to report your issue or make a request is by submitting a ticket.