The bDrive repository offers everyone at UC Berkeley unlimited storage, strong search capabilities, and mobile access. This storage is an important data management resource for research teams. The standard web client, however, does not always work well when dealing with very large files, many files, or deep folder structures. The web client’s connection is slow, and can disconnect in the midst of a lengthy, time-consuming transfer. Researchers handling many thousands of files, or files running in the 10s or 100s of gigabytes, need something more robust. Recently, working with staff from the molecular imaging labs and the Graduate School of Journalism, Research IT has been testing rclone and the Rclone Browser -- in conjunction with CalNet Special Purpose Accounts (SPAs) -- to address these shortcomings.
Rclone is a command line tool that can synchronize or copy data to and from a number of cloud storage providers including Google Drive and Amazon S3 [Editor's note: at the time this article was published, rlcone did not support Box; as of Summer 2017 Box is supported, see GitHub-hosted documentation for more information]. Rclone performs integrity checks on transfers and supports encryption of transferred files as well as file names if required. Because it is a command line tool, it is easy to use in a script. Scripted use of rclone could, for example, be launched after office hours to copy data generated during the day from multiple directories as an automated backup mechanism (the computer running rclone should be configured to not sleep if transfers will run for an extended period). Versions of Rclone are available for Mac OSX, Windows and multiple types of Unix. Once installed, the command "rclone config" guides the user through the process of establishing a connection with bDrive or other cloud storage sites.
There is also an application called Rclone Browser which provides a graphical user interface for the rclone utility. Windows, Mac OS and Ubuntu platforms are supported. Currently there is not a mechanism for scheduling transfers, but a transfer can be initiated at the end of the day to run while the users is away. Metrics are provided for each transfer, including total transfer size, errors, and duration of the transfer operation.
When moving large numbers of files or very large files to bDrive, transfers can be terminated if a Google client id is not used when establishing the connection. A client id is created in the Google API console, which can be accessed with your CalNet id or SPA account. First a project must be created, the Drive API enabled for that project, then OAuth client credentials can be generated (see basic instructions for OAuth client credentials setup on the Google Cloud Platform console). Use these credentials when configuring your bDrive connection in rclone. The Research IT team can assist with this process.
It is important when configuring rclone that the user not be logged into a CalNet account in their default browser when using an SPA account to perform a transfer. As part of the configuration process, rclone opens up a Google web page seeking permission to access the bDrive account. When the user is not logged in, Google will present the opportunity to do so. At this point, the user can log in using the CalNet SPA account. We recommend that you sign out of all CalNet accounts in your default browser.