Research IT interns develop improvements to Savio functionality in response to user input

August 15, 2017

This past year, Research IT’s interns have been working alongside staff to improve the experience of Savio cluster users. In Spring 2017, website redesign intern Sanjana Surkund conducted user interviews with researchers from a variety of backgrounds to figure out what their concerns were and, ultimately, how to improve Research IT’s website and Savio’s usability to address those concerns. Combining the stories from these interviews with the user input filed through tickets, Sanjana and UX/Visualization intern Cassie Zhang crafted three user stories that summarize the current needs of Savio users that are not yet addressed:

  1. As an admin/owner of an Instructional Computing Allowance (ICA), Professor A manages a large class of around 100 students and struggles with keeping track of his/her fluctuating user base, allocating his/her students’ allowance, and how the account is doing overall. In particular, s/he is looking for an easy way to update his/her user list in Savio.
     
  2. As an admin/owner of a Faculty Computing Allowance (FCA), Professor B manages a small group of grad students and postdocs and is primarily concerned with managing allowances for all users and preventing a single researcher from using up all of the allowances.
     
  3. As a user, Student/Postdoc C frequently uses Savio, but never knows how many Service Units (SUs) s/he’s using or allowed to use. S/he is worried about going over this unenforced limit and getting reprimanded by his/her faculty advisor or instructor.

In response to these needs, Cassie and new summer interns Sahil Hasan and Harrison Kuo have been working on two solutions. Cassie has been developing a Savio dashboard (see image at the head of this article) that provides a visually appealing and intuitive way to keep track of each Savio account and its allowance usage. On the backend, Sahil and Harrison have been working on creating an accounting system in SLURM, Savio’s cluster management and job scheduling system, to not only enable users to track their allowances, but also allow allowance owners to subdivide their FCA or ICA allowances into quantities of Service Units allocated to each user in their research group or course.

Giving Structure to Allowances

This new functionality, Cassie says, “makes everything a lot more structured, as opposed to before when everyone could use any amount [up to the total allowance].” The Savio dashboard will create a summary view of allowance usage across all users to whom a part of the allowance has been assigned. The dashboard will notify users regarding their personal usage and give access to their personal statistics, which displays SU usage for each job in their job history. Allowance owners, such as professors managing their FCA, will have access to statistics about their account as a whole, as well as detail about the user accounts to which they have distributed portions of an allowance. They can also choose to be notified when their account or their users have used up a specified percentage of their allowance. Furthermore, allowance owners will be able to use a quota manager to change each user’s allowances and add or remove users from their account without requiring a Savio system administrator’s intervention.

On the backend, Sahil and Harrison are working towards developing a SLURM banking system that can make and keep track of subdivisions of allowances within the service units granted to a Principal Investigator or instructor. This will be accomplished through two SLURM plugins that Sahil and Harrison are developing with the help of Research IT staff. The new system will duplicate some of the job data that SLURM tracks, which will then be accessed through a REST API. Duplicating data creates an issue in the implementation of the banking system, Sahil explains, “because editing what SLURM records at the same time as SLURM’s attempting to record its own data leaves a lot of room for error [and creates race conditions].” This issue will be managed with code that enforces concurrency between the two databases. The REST API provides a layer of abstraction between the online Savio dashboard and the SLURM plugins, accepting requests from the dashboard to alter or obtain information from the plugins’ database. When a job is submitted via SLURM, the two plugins will verify if the user has enough SUs to complete the requested job; if so, the plugin will provisionally subtract from the user’s allowance the estimated quantity of SUs required for the job. After a job runs successfully, the plugins update the provisionally-held quantity with the actual quantity of SUs spent from the user’s allowance.

Thus, the new banking system will give owners the ability to divide up their allowances, and also prevent users from using more SUs than they have been allotted.

Taking the Next Steps

Continuing her work from the spring, Cassie is currently refining the design on the user-facing dashboard. Her next steps will be to implement the design on a development machine using HTML/CSS and Javascript before integrating it into the production environment. Meanwhile, Sahil and Harrison are also transitioning from the design phase, in which they have been documenting the specs for the SLURM banking system and holding weekly development meetings with BRC staff, into an implementation phase. The interns hope to finish developing the banking system and dashboard during the Fall 2017 semester, at which point the system will undergo stringent testing before being rolled out to Savio users.

“I’m very glad that [this project] shows me a way to directly apply [what] I learned in CS and given me hands-on experience with that,” Cassie says, commenting on her work with the BRC team. Cassie also notes that taking a Design DeCal last semester proved extremely helpful while working on a real project. Harrison, who has expressed his interest in High Performance Computing (HPC) since the beginning of his internship, is excited that he’s been able to learn about Berkeley Research Computing’s HPC architecture and use that knowledge to support Savio users when he’s not working on the SLURM banking system. “Overall,” Harrison says, “I’ve learned a lot about how REST APIs and databases actually talk and learned how SLURM actually processes jobs, which is very interesting.” To this, Sahil added that he likes that he’s been able to get exposure to new concepts just by sitting in meetings whether or not they pertain directly to his project. “It’s very nice being part of [other] discussions [even though they] realistically have nothing to do with [the project],” he explains.

As a team, Cassie, Sahil, and Harrison will be presenting alongside Research IT staff about how they’ve been able to extend SLURM’s functionality at the 2017 SLURM User Group Meeting this September, which will be held at the National Energy Research Scientific Computing Center (NERSC) in Berkeley. If you’re interested in following this project, keep track of this space for updates!