Condo Cluster Service

Overview | Program Details | Hardware | Charter ContributorsFaculty Perspectives

Overview

BRC manages Savio the new high-performance computational cluster for research computing. Designed as a turnkey computing resource, it features flexible usage and business models, and professional system administration. Unlike traditional clusters, Savio is a collaborative system wherein the majority of nodes are purchased and shared by the cluster users, known as condo owners.

The model for sustaining computing resources is premised on faculty and principal investigators purchasing compute nodes (individual servers) from their grants or other available funds which are then added to the cluster. This allows PI-owned nodes to take advantage of the high speed Infiniband interconnect and high performance Lustre parallel filesystem storage associated with Savio. Operating costs for managing and housing PI-owned compute nodes are waived in exchange for letting other users make use of any idle compute cycles on the PI-owned nodes. PI owners have priority access to computing resources equivalent to those purchased with their funds, but can access more nodes for their research if needed. This provides the PI with much greater flexibility than owning a standalone cluster.

Program Details

Compute node equipment is purchased and maintained based on a 5-year lifecycle. PIs owning the nodes will be notified during year 4 that the nodes will have to be upgraded before the end of year 5. If the hardware is not upgraded by the end of 5 years, the PI may donate the equipment to Savio or take possession of the equipment (removal of the equipment from Savio and transfer to another location is at the PI's expense); nodes left in the cluster after five years may be removed and disposed of at the discretion of the BRC program manager

Once a PI has decided to participate, the PI or his designate works with the HPC Services manager and IST teams to procure the desired number of compute nodes and allocate the needed storage. There is a 4-node minimum buy-in for any given compute pool  and all 4 nodes must be the same whether it be the Standard, HTC, Bigmem, or GPU nodes. GPU nodes are the most expensive; therefore, if a group has already purchased the 4-node minimum of any other type of node, they can purchase and add single GPU nodes to their Condo. Generally, procurement takes about three months from start to finish. In the interim, a test condo queue with a small allocation will be set up for the PI's users in anticipation of acquiring the new equipment. Users may submit jobs to the general queues on the cluster using their Faculty Computing Allowance. Jobs are subject to general queue limitations and guaranteed access to contributed cores is not provided until purchased nodes are provisioned.

Hardware Requirements for Condo Participation (Updated March 22, 2016)

Basic specifications for the systems listed below:

  • General and Serial HTC Compute: Lenovo system chassis purchased with 4 ea. NeXtScale nx360m5 compute nodes.
  • GPU Compute: Finetec Computer Supermicro 1U node.

Note: condo contributors are required to also purchase a 2M FDR InfiniBand cable for each node purchased.

Detailed specifications for each node type:

General Computing Node
Processors Dual-socket, 12-core, 2.3 GHz Intel Haswell Xeon E5-2670v3 processors (24 cores/node)
Memory 64 GB (8 X 8 GB) 2133 Mhz DDR4 RDIMMs
Interconnect 56 Gb/s Mellanox ConnectX3 FDR-14 Infiniband interconnect
Hard Drive 500 GB 7.2K RPM SATA HDD (Local swap and log files)
Warranty 5 yrs
Big Memory Computing Node (128 GB RAM)
Processors Dual-socket, 12-core, 2.3 GHz Intel Haswell Xeon E5-2670v3 processors (24 cores/node)
Memory 128 GB (8 X 16 GB) 2133 Mhz DDR4 RDIMMs
Interconnect 56 Gb/s Mellanox ConnectX3 FDR-14 Infiniband interconnect
Hard Drive 500 GB 7.2K RPM SATA HDD (Local swap and log files)
Warranty 5 yrs
Serial HTC Computing Node
Processors Dual-socket, 6-core, 3.4 GHz Haswell Xeon E5-2643v3 processors (12 cores/node)
Memory 128 GB (8 X 16 GB) 2133 Mhz DDR4 RDIMMs
Interconnect 56 Gb/s Mellanox ConnectX3 FDR-14 Infiniband interconnect
Hard Drive 500 GB 7.2K RPM SATA HDD (Local swap and log files)
Warranty 5 yrs
GPU Computing Node
Processors Dual-socket, 4-core, 3.0 GHz Haswell Xeon E5-2623v3 processors (8 cores/node)
Memory 64 GB (4 X 16 GB) 1866 Mhz DDR4 RDIMMs
Interconnect 56 Gb/s Mellanox ConnectX3 FDR-14 Infiniband interconnect
GPU 2 ea. Nvidia Tesla K80 accelerator boards
Hard Drive 500 GB 10K RPM SATA HDD (Local swap and log files)
Warranty 5 yrs

Hardware Purchasing: Prospective condo owners should contact us for current pricing and prior to purchasing any equipment to insure compatibility. BRC will assist with entering a compute node purchase requisition on behalf of UC Berkeley faculty.

Software: Prospective Condo owners should review the System Software section of the System Overview page to confirm that their applications are compatible with Savio's operating system, job scheduler and operating environment.

Storage: All institutional and condo users have a 10 GB home directory with backups; in addition, each research group is eligible to receive up to 200 GB of shared project space (30 GB for Faculty Computing Allowance accounts and 200 GB for Condo accounts) to hold research specific application software that is shared among the users of a research group. All users have access to the Savio high performance scratch filesystem for non-persistent data. Users or projects needing more space for persistent data can also purchase additional performance tier storage from IST at the current rate. For even larger storage needs, Condo partners may also take advantage of the Condo Storage service, which provides low-cost storage for very large data needs (minimum 25 TB).

Network: A Mellanox FDR-14 36-port unmanaged leaf switch is used for every 24 ea. compute nodes.

Charter Condo Contributors

The following is a list of all those who contributed Charter nodes to the Savio Condo, thus helping launch the Savio cluster:

Eliot Quataert, Theoretical Astrophysics Center, Berkeley Astronomy Department
Eugene Chiang, Berkeley Astronomy Department
Chris McKee, Berkeley Astronomy Department
Richard Klein, Berkeley Astronomy Department
Uros Seljak, Physics Department
Jon Arons, Berkeley Astronomy Department
Ron Cohen, Department of Chemistry, Department of Earth and Planetary Science
John Chiang, Department of Geography and Berkeley Atmospheric Sciences Center
Fotini Katopodes Chow, Department of Civil and Environmental Engineering
Jasmina Vujic, Department of Nuclear Engineering
Jasjeet Sekhon, Department of Political Science and Statistics
Rachel Slaybaugh, Nuclear Engineering
Massimiliano Fratoni, Nuclear Engineering
Hiroshi Nikaido, Molecular and Cell Biology
Donna Hendrix, Computation Genomics Research Lab
Justin McCrary, Director D-Lab
Alan Hubbard, Professor and Head of Biostatistics Division, School of Public Health
Mark van der Laan, Professor of Biostatistics and Statistics, School of Public Health
Michael Manga, Department of Earth and Planetary Sciences
Jeff Neaton, Physics
Eric Neuscamman, College of Chemistry
M. Alam Reza, Mechanical Engineering
Elaine Tseng, UCSF School of Medicine
Julius Guccione, UCSF Department of Surgery
Ryan Lovett, Statistical Computing Facility
David Limmer, College of Chemistry
Doris Bachtrog, Integrative Biology
Kranthi Mandadapu, College of Chemistry
Kristin Persson, Department of Materials Science and Engineering

Faculty Perspectives

UC Berkeley Professor of Astrophysics Eliot Quataert speaks at the BRC Program Launch (22 May 2014) on the need for local high performance computing (HPC) clusters, distinct from national resources such as NSF, DOE (NERSC), and NASA.

UC Berkeley Professor of Integrated Biology Rasmus Nielsen speaks at the BRC Program Launch (22 May 2014) about the transformative effect of using HPC in genomics research.