Challenges and Opportunities of VM Image Curation and Discovery

Virtual Machine (VM) images are being used more and more widely, across a range of research computation services. With the increasing creation and adoption of VM images, there is a need for maintenance regimes, discovery mechanisms, and curation practices to support researchers whose focus is on the use of these images, not their creation and upkeep. Berkeley Research Computing (BRC) staff will co-host a Birds-of-a-Feather (BoF) session at the upcoming conference on Practice & Experience in Advanced Research Computing (PEARC), to bring bring together practitioners from a range of institutions to discuss use-cases, needs, current tools and activities that can contribute to possible solutions, as well as next steps to address needs across the community. The BoF is titled “Challenges and Opportunities of VM Image Curation and Discovery,” and was proposed in collaboration with colleagues from Rutgers University and Indiana University. Another five universities and labs also committed to participate as part of the proposal to PEARC.

As described in the proposal, the promise of VMs, container-based computing, and turnkey environments include:

  • Flexibility, allowing researchers to “bring your own environment” (BYOE)
  • Repeatability, making it easier to reuse or reproduce a computational environment
  • Ease of entry for new domains applying computational methods to their research, and for researchers who are new to computational methods or tools

To realize this promise, there is a requirement for a maintenance regime to keep software appropriately updated, and for discovery mechanisms accessible to researchers across academic domains; these in turn require some sort of curation, and models for good support practices. From a researcher perspective, an initial step in their research workflow is the discovery and tailored configuration of reliable, preferably vetted images, from which to select an appropriate substrate for their computational research. From the perspective of a research support organization, there is a need to guide users to vetted images that require minimal additional installation and configuration, and to help contain an otherwise unmanageable proliferation of one-off VM images, including the unsustainable storage demands for the myriad VM images.

The workshop will address three questions:

  1. What are the important use cases, both current and anticipated? 
  2. What are the primary needs of both researchers, and research support organizations? 
  3. What is already available, or under development, that can help?

Experiences at the BoF, as well as other impressions from the PEARC conference will be the topic of the August 10th Research IT Reading Group, from 12-1 PM in room 200C, Earl Warren Hall (2195 Hearst Ave). Anyone interested in these topics is welcome.