Cloud Working Group mid-semester update: Spring 2017

April 6, 2017

This semester’s Cloud Working Group (CloudWG) has focused on a researcher’s ability to move her data and execute her research computation on multiple platforms, including commercial cloud (e.g., Amazon AWS), national infrastructure (e.g., XSEDE’s Jetstream), and a local workstation or laptop.

In our regular bi-weekly Thursday gatherings at the D-Lab Collaboratory the CloudWG focuses on “Cloud-based Workflows for Research, Teaching, and Training”. We’ve been working closely with the organizers and attendees of the Machine Learning Working Group (MLWG) and Computational Text Analysis Working Group (CTAWG), as well as the BIDS Reproducibility and Open Science Working Group.

Our focus has been to help enable laptop-based research workflows migrate to cloud resources when needed, with the ability to move research data and computation back and forth between cloud and laptop, a capability we call “Mobility of Compute.”

Some notable projects that we’ve collaborated on in Spring 2017 include the CTAWG Congressional Records projectStanford CoreNLP with RStudio, and the Lexicon project. These collaborations build on and extend prior work with BRC and BIDS that combines the use of Docker with a cloud platform called Jetstream that is free to researchers through the XSEDE program.

To help enable the mobility of compute on Jetstream, we’ve prepared a setup script that provisions a curated set of utilities, including DockerSingularity, and Globus. These key utilities make it easy to deploy community-curated containers such as the ROpenSci project’s RStudio Docker (rocker) images or the Jupyter project’s Docker images, such as the datascience-notebook.

Mobility of compute includes the ability to migrate research workflows between different cloud providers. So far this semester we’ve been exploring RStudio AMIs on AWS to discover best practices we can include in our docker-based solutions that will make it easier to move between Jetstream and AWS.

We’ve also been working with the Google Cloud Platform’s Kubernetes (GKE) container service in collaboration with the Infrastructure team of the Data Science Education Program (DSEP) and Research Services at Haas School of Business. You can check out the work-in-progress recipe Zero to Jupyter in 15 mins and related quick deployment script.

We look forward to sharing another update as the semester progress and these works-in-progress mature. If you would like to get involved, you can:

If you have specific questions about cloud computing and would like to request a one-on-one consultation through Berkeley Research Computing (BRC) Cloud Computing Support, please email us at: brc@berkeley.edu.