Savio cluster storage quadrupled to support Big Data research

July 10, 2015

John White

The campus’ Savio computing cluster received a major storage upgrade on June 12, 2015, when its Global Scratch file system was quadrupled in size, to a massive 885 terabytes (TB) of storage. The upgraded storage is also 250% faster, providing a peak bandwidth of over 20 GB per second, to better meet the demands of data intensive computations.

A large, fast, parallel scratch file system is one of the key benefits to researchers using the Savio cluster. The June storage upgrade, along with Savio’s recently-installed Science DMZ connection, will help campus researchers ingest and analyze increasingly large datasets which are becoming characteristic of many areas of research. The newly-expanded scratch file system also offers researchers additional space for writing intermediate and final outputs.

Savio’s scratch file system expansion was installed by John White and his colleagues on the Lawrence Berkeley Laboratory High Performance Computing Services team, which administers Savio in partnership with UC Berkeley’s Research IT department. During the cutover from old to new storage platforms, Savio’s administrators migrated approximately 160 TB of user data, making this transition relatively seamless to cluster users.

About Savio’s parallel file system

Along with Savio’s modestly-sized home and group directories, which are allocated to individual users and research groups, the cluster also provides an extremely large shared global scratch space.

This scratch space is implemented using Lustre, an open source, distributed parallel file system. While presenting a POSIX-compliant filesystem to client compute nodes, Lustre utilizes a set of special-purpose servers behind the scenes to allow it to meet the intense I/O demands of highly parallel, in-memory applications.

Lustre’s scalable architecture consists of three primary components: clients, which are compute nodes that communicate with the file system via Lustre client software; Metadata Servers, which manage file system metadata; and Object Storage Servers, which store file data objects. Both the Metadata Servers and Object Storage Servers are backed by large, striped and replicated RAID disk arrays, known respectively as Metadata Targets and Object Storage Targets (OSTs).

A high-level conceptual overview of the Lustre file system architecture. Source: National Institute for Computational Sciences (NICS) at the University of Tennessee, Knoxville

When a client needs to read or write a file, it contacts a Metadata Server, which checks access rights and file properties, and sends back a list of relevant OSTs: those to which data will be written, or on which the data has already been striped. From that point on, irrespective of the size of the file, the client then interacts exclusively with the OSTs, reading or writing data in parallel streams until the operation is completed. When a Lustre network is constructed using high-speed InfiniBand networking, as is the case on Savio, client communications with OSTs can utilize Remote Direct Memory Access, where in-memory data is sent directly over the network, bypassing various operating system layers and thus achieving extremely high throughput with low latency.

About Savio’s underlying storage platform

Savio’s upgraded global scratch file system is built on a new Data Direct Networks (DDN) Storage Fusion Architecture (SFA) 12KE storage platform. In Savio’s case, the DDN storage platform consists of six embedded virtual machines that are hosted directly on the DDN storage controllers, rather than on separate I/O nodes. This tight integration reduces complexity, latency, and overhead administrative costs as well as eliminating the need for external I/O servers, cabling and network switches. The new platform also includes vendor support for the Lustre parallel file system to ensure smooth upgrades and optimized storage performance

During its first 15 months of operation, Savio’s scratch file system was hosted on a combination of older DDN storage and I/O servers that had formerly been deployed by the UCOP ShaRCS cluster project. The availability of that hardware helped Savio to launch quickly and at a lower initial cost than would have been possible otherwise. Unfortunately, the ShaRCS-era hardware quickly started showing its age, and its capacity and limited performance became inadequate as usage on the system increased.