Research IT at Moore-Sloan Data Science Environment (MSDSE) Summit

January 12, 2018

Aaron Culich

At the 2017 Moore-Sloan Data Science Environment (MSDSE) Summit, held in New Orleans this past November, UC Berkeley and U Washington reported on collaborations sparked by conversations between the two institutions at the previous year’s 2016 MSDSE Summit.

The summit brought together members of the data science communities from the NYU Center for Data Science, University of Washington e-Science Institute, and Berkeley Institute for Data Science (BIDS) to build community, explore ideas, and encourage collaboration between the three institutions whose membership in the Data Science Environments Partnership is supported by a five-year $37.8 million cross-institutional funding commitment generously provided by the Gordon and Betty Moore Foundation and Alfred P. Sloan Foundation.

UC Berkeley’s Research IT group was represented at the summit by Chris Hoffman, Research IT Associate Director, and Aaron Culich, Research Computing Architect.

The first 2016 post-summit activity begin with this question from a BIDS Fellow: “Can we run our own custom version of Binder on our own cloud?” Together, BIDS and Research IT staff began experimenting by deploying Jupyter Notebooks in Docker containers on the XSEDE Jetstream cloud to support multi-institutional data science workshops and research. The work resulted in a conference paper about Portable Learning Environments (PLEs) at PEARC17 — a collaboration between Research IT and BIDS, UW, and UC San Francisco (UCSF). Subsequent collaboration with the Berkeley Data Science Education Program (DSEP) infrastructure and Jupyter Project teams yielded Zero to JupyterHub: a complete self-service provisioning guide to allow anyone to deploy JupyterHub on two major commercial cloud platforms, with UW adding a third major platform to support their Neuro Hackweek. The year's work culminated in the in the 2017 Binder Workshop at UC Davis.

At this year’s 2017 Summit the new Binder infrastructure was demoed. The demo spawned new discussions and opportunities for collaboration emerged — of particular interest were integrations with Dataverse and the Open Science Framework (OSF). We also discovered that the NYU ReproZip project is exploring support for Singularity and integration with Binder.

The experimentation and innovation outlined here give substance to a sense shared by many attendees that events like this Data Science Summit are about planting many seeds, and coming together to enjoy the fruits of our collaborations.