Take advantage of unlimited bDrive (and Box) storage using rclone

Overview: why rclone?
Getting started with rclone
   Using rclone with an SPA account
   Install rclone
   Configure rclone
   Using rclone
Alternative to using a SPA account for rclone data transfers

Overview: why rclone?

The bDrive storage and collaboration solution offers everyone at UC Berkeley unlimited storage, strong search capabilities, and mobile access. This storage is an important data management resource for research teams, and is often used for offsite backups of valuable data.

Unfortunately, the web browser interface for bDrive doesn’t always work well when dealing with very large files, many files, or deep folder structures. The web client’s connection is slow, and can disconnect in the midst of a lengthy, time-consuming transfer. Researchers handling many thousands of files, or files running in the 10s or 100s of gigabytes, need something more robust.

The open-source tool rclone could be the robust tool researchers are looking for to extend beyond the capacities of the bDrive web interface, effectively synchronizing or copying significant quantities of data to and from Google Drive (the vendor storage solution that underlies bDrive). [For Box users, rclone also transfers files to/from Box.]

rclone performs integrity checks on transfers and supports encryption of transferred files as well as file names if required. Because it is a command line tool, it is easy to use in a script. Scripted use of rclone could, for example, be launched after-hours to copy data generated during the day from multiple directories as an automated backup mechanism (the computer running rclone should be configured to not sleep if transfers will run for an extended period). Versions of rclone are available for Mac OSX, Windows, and multiple types of Unix.

IMPORTANT: Researchers must come to their own decision about whether to use an open-source tool. The rclone software is largely (though not exclusively) maintained by a single developer, who is generally very responsive to issues and problems raised by a significantly-sized community of users. However, the software is not ‘guaranteed’ or ‘supported’ by any vendor. Research IT’s assessment of this software was based on a review at v1.37, in December 2017. Research IT does not currently know of any well-supported browser-based or other graphical user interface (GUI) overlay to rclone, which is a command line tool. For those who are not comfortable using the command line, Research IT suggests consideration of cloud storage on Box, for which the campus has also negotiated unlimited storage capacity (for files of 15GB or less) for researchers, faculty, students, and staff; a widely-used open source GUI tool that many people use to move data to Box via the SFTP protocol is Filezilla.

Getting started with rclone

If you need help with the installation and configuration steps described below, please contact the Research Data Management Program and a consultant will be happy to help: researchdata@berkeley.edu

Using rclone with an SPA account

Research IT strongly recommends that rclone be used with a Special Purpose Account (SPA), and not with the bDrive storage owned by (and accessible via) your personal CalNet ID. Separating this third-party tool’s access from the login you may use to store sensitive data or intellectual property (e.g., papers and monographs in progress; FERPA-protected student information; etc.) is an effective way of safeguarding your files. Also, use of a SPA account conveniently enables desired access by current -- and future -- colleagues, successors, tech support personnel, et al. For those who cannot or do not wish to take advantage of these SPA account benefits, an alternative is described at the bottom of this page (Alternative to using a SPA account for rclone data transfers).

As noted below, it is best to log into the SPA you are using (via your default web browser) before configuring as described below. It is simplest to configure if this is the only account to which your default web browser is logged in. We recommend logging into the SPA account using an incognito or private window of your browser to ensure authorization with the appropraite account.

Install rclone

From rclone’s Downloads page, obtain a binary (runnable copy of the software) that matches your computer’s operating system. Follow the installation instructions appropriate to MacOS or Linux on the project’s Install page. If you’re installing into Windows, download and unzip the .zip archive that corresponds to your computer’s processor (386 - 32 Bit or AMD64 - 64 Bit); an executable (.exe) binary of rclone is inside the .zip archive, which you can run from a command window (cmd.exe).

Note: If you’re not sure whether your Windows computer is 32-bit or 64-bit, check here for Windows machines through Windows 8; or check here for Windows 10. If you’re not sure whether your Mac OS X computer is 32-bit or 64-bit, you can run the command getconf LONG_BIT in a terminal window to find out; or, even more simply, if you are running Mac OS 10.7 or later you have a 64-bit machine.

Configure rclone

Configuration of rclone enables the software’s access to a cloud storage platform; each cloud platform with which rclone is to be used, such as bDrive, requires its own configuration.

Before running rclone config, log into the bDrive account you are going to use to store files. It’s easiest if you log into the account in your web browser before running the rclone config process. Again, Research IT recommends you use an SPA account, as described above; if you choose not to, be sure to read the section Alternative to using a SPA account for rclone data transfers at the bottom of this page before proceeding.

For those who prefer to skip the introductory material, at this point you can proceed to the enumerated list, below.

For your laptop or desktop/workstation: During the config process (rclone config, documented at the link above) you identify the platform to which you want to connect (Google Drive in this case, as it is the underlying platform behind bDrive). At a later point in the process, the tool’s “auto config” option will open a Google page in your default browser requesting permission to access files and storage owned by the bDrive account you are using to store your research files. Use the account, to which you have already logged in per the recommendation above, to grant rclone this permission.You’ll click “Agree” and control should then be automatically returned to the terminal window, where you will finish the configuration. (If you are using rclone on a machine that isn’t equipped with a web browser, and are not sure how to work around that issue, see “For Savio or other remote machines,” below, or contact the Research Data Management Program and a consultant will be happy to help: researchdata@berkeley.edu.)

For Savio or other remote machines: rclone is already installed on Savio, the shared campus cluster run by Berkeley Research Computing. Log into the DTN (Data Transfer Node, at dtn.brc.berkeley.edu) via ssh to transfer files from Savio to bDrive [HOWEVER, please see "Special note for invoking rclone on Savio, below!!]. For Savio or other remote machines, configuration is the same as described above, except that:

  • When asked during the configuration process whether to “Use auto config?” you should answer “N” (no).
  • For bDrive:
    • You’ll be presented with a long URL that you should copy from your terminal window, and paste into your local machine’s web browser (e.g., paste from your terminal window into your laptop’s web browser).
    • After choosing the account through which to authenticate to Google Drive, you will be presented with a long string, which is the authentication token.
    • Copy the long string from your browser and paste at the rclone config prompt in your terminal window.
  • For Box:
    • You'll need to install rclone on your local machine (or another machine with a browser)
    • On your local machine run: rclone authorize "box"
    • Copy the URL that is displayed and paste it into your web browser (ideally in the private or incognito window in which you've already logged in to the SPA)
    • Select "authorize", then go back to your terminal and copy the authentication token that is displayed
    • Paste the auth token at the rclone config prompt in your terminal connected to the remote server

In summary, here are the high-level steps to take to configure rclone:

  1. Create a CalNet SPA account you will use exclusively for storing your research data, if you haven’t done so already. If you’re already familiar with creating a SPA you can go directly to the SPA Accounts Admin app to create and manage SPAs.
     
  2. Using a private or incognito browser, log in to the SPA account in Step 1. This may not be necessary, but greatly simplifies the process.
     
  3. In a command / terminal window, follow the rclone configuration process for Google Drive or for Box as documented on the project site.

Subsequently, when you test/run rclone you should be able to view, download from, and upload to, the bDrive folders owned by the SPA account described in Step 1, above. See Using rclone, below.

Special note for invoking rclone on Savio: As of July 2018, the default version of rclone available on Savio is 1.39. That older version will work fine if you are using rclone to copy to/from a SPA-owned instance of Google Drive/bDrive. However, if you choose the alternate method described at the bottom of this page, involving restricting rclone access, you will need to load a newer version (e.g., rclone v1.42) before invoking it. Here's how to do so (the first two commands) and verify that you're running the intended version (2nd two commands):

[username@dtn ~]$ export MODULEPATH=/global/home/groups/consultsw/sl-7.x86_64/modfiles:$MODULEPATH
[username@dtn ~]$ module load rclone/1.42
[username@dtn ~]$ which rclone
/global/home/groups/consultsw/sl-7.x86_64/modules/rclone/1.42/rclone
[username@dtn ~]$ rclone --version
rclone v1.42
- os/arch: linux/amd64
- go version: go1.10.1
[username@dtn ~]$

If you are interested in understanding modules on Savio and how to load and invoke them, please visit BRC's Accessing and Installing Software page.

Using rclone

Usage documentation for rclone can be found on the project site.

Some common commands are given below as a quick reference, and assume that you have run rclone config to authorize access to bDrive, and have given my-spa-bdrive as the name of your configuration. You may need to type the path to the rclone binary (executable), depending on your operating system and environment variable settings.

List directories on bDrive:

$ rclone lsd my-spa-bdrive:

List files on bDrive:

$ rclone ls my-spa-bdrive:

Copy all files files in my local “bkp” directory (/home/mylogin/Documents/bkp) into a directory called “bdrive-bkp” on bDrive:

$ rclone copy /home/mylogin/Documents/bkp my-spa-bdrive:bdrive-bkp

List files in the directory “bkp” on bDrive:

$ rclone ls my-spa-bdrive:bdrive-bkp

Alternative to using a SPA account for rclone data transfers

If you are working alone, with no colleagues and no need to make your backups available to others should you leave UC Berkeley, rclone provides an "access scope" setting that allows you to use your individual bDrive account while maintaining the safety of your other files by isolating them from the rclone application. This feature is available for rclone version 1.40 and higher. If you are using rclone on Savio, see the "Special note for invoking rclone on Savio" above!

To utilize this feature, you will need to grant a limited access scope to rclone during the configuration process, as follows:

  • During the configuration process you will be asked to choose the "Scope that rclone should use when requesting access from drive"
  • Choose the scope "Access to files created by rclone only ... drive.file" instead of "Full access all files ... drive".

This choice will instruct bDrive to grant access only to files that have been added to bDrive using rclone; and rclone will not be able to "see," move, delete, or transfer files added to bDrive using the Google Drive web browser interface.

If using this configuration, you may find it convenient to create a single folder within which rclone can add files and folders. A command that writes to the folder rclone-bkp (creating it if the folder doesn't exist yet), in the account for which the rclone configuration is named my-bdrive might look something like this

$ rclone copy /home/mylogin/Documents/bkp my-bdrive:rclone-bkp

To use a sub-folder somefiles in the rclone-bkp folder:

$ rclone copy /home/mylogin/Documents/bkp/somefiles my-bdrive:rclone-bkp/somefiles