This semester’s Cloud Working Group (CloudWG) has focused on a researcher’s ability to move her data and execute her research computation on multiple platforms, including commercial cloud (e.g., Amazon AWS), national infrastructure (e.g., XSEDE’s Jetstream), and a local workstation or laptop.
In our regular bi-weekly Thursday gatherings at the D-Lab Collaboratory the CloudWG focuses on “Cloud-based Workflows for Research, Teaching, and Training”. We’ve been working closely with the organizers and attendees of the Machine Learning Working Group (MLWG)(link is external) and Computational Text Analysis Working Group (CTAWG)(link is external), as well as the BIDS Reproducibility and Open Science Working Group(link is external).
Our focus has been to help enable laptop-based research workflows migrate to cloud resources when needed, with the ability to move research data and computation back and forth between cloud and laptop, a capability we call “Mobility of Compute(link is external).”
Some notable projects that we’ve collaborated on in Spring 2017 include the CTAWG Congressional Records project(link is external), Stanford CoreNLP with RStudio(link is external), and the Lexicon project(link is external). These collaborations build on and extend prior work with BRC and BIDS that combines the use of Docker with a cloud platform called Jetstream(link is external) that is free to researchers through the XSEDE program(link is external).
To help enable the mobility of compute on Jetstream, we’ve prepared a setup script(link is external) that provisions a curated set of utilities, including Docker(link is external), Singularity(link is external), and Globus(link is external). These key utilities make it easy to deploy community-curated containers such as the ROpenSci(link is external) project’s RStudio Docker (rocker) images(link is external) or the Jupyter project(link is external)’s Docker images, such as the datascience-notebook(link is external).
Mobility of compute includes the ability to migrate research workflows between different cloud providers. So far this semester we’ve been exploring RStudio AMIs on AWS(link is external) to discover best practices we can include in our docker-based solutions that will make it easier to move between Jetstream and AWS.
We’ve also been working with the Google Cloud Platform’s Kubernetes (GKE) container service(link is external) in collaboration with the Infrastructure team of the Data Science Education Program (DSEP)(link is external) and Research Services at Haas School of Business(link is external). You can check out the work-in-progress recipe Zero to Jupyter in 15 mins(link is external) and related quick deployment script(link is external).
We look forward to sharing another update as the semester progress and these works-in-progress mature. If you would like to get involved, you can:
- join us on Slack: https://uc-jupyter.slack.com/messages/cloud/(link is external)
- sign up for email updates: http://bit.ly/cloud-updates(link is external)
- and/or send an email to: cloud-working-group@berkeley.edu(link sends e-mail)
If you have specific questions about cloud computing and would like to request a one-on-one consultation through Berkeley Research Computing (BRC) Cloud Computing Support, please email us at: brc@berkeley.edu(link sends e-mail).