Lustre in a Condo Computing Environment

April 28, 2016

Berkeley Research Computing (BRC) and LBNL Storage Lead John White gave a talk on “Lustre in a Condo Computing Environment” at the Lustre Users Group (LUG) in Portland, OR on April 6, 2016. Lustre is a type of parallel distributed file system used in cluster computing.

John’s presentation surveyed the challenges involved in providing parallel storage to a condo-style High Performance Computing (HPC) cluster, such as BRC’s Savio cluster. Institutional-scale clusters such as Savio occupy a middle ground between clusters managed at the lab level by graduate students, and national-allocation class computing offered by organizations like NERSC and XSEDE. Management of clusters at an institutional scale is characterized by frequent small-scale buy-ins, with less-frequent, larger buy-ins funded by major grants.

The variability of revenue sources and unpredictable schedule of buy-ins requires a nimble infrastructure to maximize value for researchers, including Lustre upgrade paths that break traditional parallel file system rules. Ease of management, supported above all by maximizing uniformity across numerous Lustre instances, is the key to realizing the building-block infrastructure that delivers the lowest cost per gigabyte and per FLOP to Berkeley and LBNL researchers.

John also discussed the BRC’s current exploration of a possible “condo storage” offering on Savio, in which researchers could purchase storage capacity that would be managed by the BRC HPC team in exchange for a modest “tax” on the purchased capacity that would be shared among all cluster users.