New BRC Condo Storage Service provides big data storage for Savio partners

October 31, 2016

Berkeley Research Computing (BRC) is launching a new Condo Storage service for Savio condo partners who need very large, persistent storage capability at a reasonable price.

Many researchers need much more storage than BRC provides to Savio projects by default, and need storage that is persistent over a longer term than the cluster’s scratch filesystem provides. While IST storage solutions fit projects with modest storage needs, they become prohibitively expensive for researchers who have tens or hundreds of terabytes of data.

Jason Huff, Director of the Computational Genomics Resource Laboratory (CGRL), describes the needs of CGRL researchers: "Genomics is predicted to soon be one of the most data storage-intensive projects for humanity. A storage system integrated into the BRC supercomputing environment will ... guarantee that CGRL researchers can compute efficiently on large amounts of genomics data."

Another example comes from researchers in domains such as Nuclear Engineering, who are migrating from other clusters to Savio, and need to move hundreds of terabytes of existing data to the new environment.

In response to needs of researchers who need 25 terabytes or more of storage, BRC has created a new service within the Savio infrastructure: a Condo Storage offering. This follows the pattern of the condo computing model, in which researchers purchase and contribute hardware to a shared resource managed by BRC. BRC offers a multi-petabyte scale storage infrastructure (a Dell Compellent system) to Condo Storage partners, and expert staff to manage the system. Condo Storage partners purchase "shelves" of storage and add these to the shared infrastructure. A small portion of partner storage is allocated to the BRC program, in exchange for covering the recurring cost of managing the additional storage; the program uses this to provide user and project storage to all Savio users.

CGRL is among the first partners to take advantage of the new service. Huff explained: "Condo storage will immediately make computing more efficient for CGRL users because less time will be spent moving massive amounts of genomic data in and out of the supercomputing environment. Condo storage will be complementary to longer-term backup/archival storage and publicly available databases."

The net price to researchers is $59/usable-TB/yr, which is much lower than currently practical alternatives. The minimum purchase is 25 TB of storage, which costs $7000 (including a 5 year warranty). The system features dual-parity RAID 6, for security and stability of data. Purchased shelves will be managed for five years, after which partners must refresh the storage.

The table below compares the cost of various storage alternatives:

Comparison for 50 TB over 5 yrs:
Model/service Details of cost Total cost Cost/TB/yr
UCB IST Performance tier 50TB x $1680/TB/yr x 5 yrs $420,000 $1680
UCB IST Utility tier 50TB x $600/TB/yr x 5 yrs $150,000 $600
CASS (UCLA)1 50TB x $119.12/TB/yr x 5 yrs $29,780 $119
AWS Glacier2 50TB x $84/TB/yr x 5 yrs $21,000 $84
BRC Condo Storage $14K + CA Sales tax: $7443 $14,744 $59

If your group is interested in becoming a Condo Storage partner, please contact research-it@berkeley.edu.