Status and Service Updates
Savio HPC Services Resumed: Mon, 8/12
We are excited to report that Savio HPC services have resumed. As planned, the Berkeley IT team repaired the automated transfer switch in the data center over the weekend. The data center's power is restored, and the Savio supercluster is back online. Jobs have started running, and HPC services, including Open OnDemand and Globus, are also back in service.
Savio outage (data center repair)- Starting 5 PM, Friday Aug 9th
The Berkeley IT team is planning to repair the automated transfer switch in the Earl Warren Hall Data Center. The work is needed to automate the power failover to generators during future power outages. We have scheduled a Savio downtime to accommodate the repair work, which calls for a full power shutdown of the data center. The downtime will start at 5:00 PM on Friday, Aug 9th and we anticipate to return the HPC services on Monday, Aug 12th by 5:00 PM. A scheduler reservation is already in place to ensure that no jobs run after 5:00 PM on Friday, August 9th. If you plan to submit jobs, please request proper walltime to ensure that jobs complete before the downtime. Otherwise, your jobs will wait in the queue until the cluster is back online.
Savio HPC Open OnDemand Service back online: Mon, 7/15
The Open OnDemand HPC service at https://ood.brc.berkeley.edu/ is back online. We appreciate your patience while we were working through some issues. The service has some changes. We have upgraded Open OnDemand to the latest version, 3.1.7. We have also adopted CILogon for user authentication to eliminate the repetitive login problems you might have experienced. Please select the appropriate institute, primarily the University of California, Berkeley, at the login page. The command-line tool email_lookup.sh can help clarify at which institute you should log in.
Savio HPC Services are back online except for OOD: Wed, 7/10
The Savio HPC system (with the new Rocky Linux 8 OS installed and implemented), with the exception of the Open OnDemand (OOD) service, has been returned to service and is available to users. We will need more time to configure OOD, so please do not attempt to use OOD at this time. Please note, however, that the Savio documentation has not yet been fully updated to reflect changes due to the new Rocky Linux 8 OS (e.g., changes in the software stack and software module farms, changes in how to compile user code, etc.). Therefore, until the updates in the Savio documentation have been completed, we suggest that Savio users refer to the LBNL Science IT documentation for the Lawrencium HPC system at https://scienceit-docs.lbl.gov/hpc/rocky8-migration/ , https://scienceit-docs.lbl.gov/hpc/software/software-module-farm/ , and https://scienceit-docs.lbl.gov/hpc/software/module-management/ (which is similar to though not exactly the same as Savio) as a temporary guide to some of the changes that have taken place on the Savio system due to the Savio Rocky Linux 8 OS upgrade.
Savio Downtime: Fri, 7/5 - Wed, 7/10
As you know, we have been working on upgrading the Savio operation system to Rocky 8. To complete the OS upgrade, we coordinated with the data center group on campus to combine our work with the long-awaited power work needed at the data center. The joint downtime will start at 5PM on Friday, 7/5. We anticipate to return the services by the end of Wednesday, 7/10. A scheduler reservation is in place to ensure no jobs run after 5PM on Friday, 7/5. If you plan to submit jobs, please request the appropriate wall time to ensure job completion before the downtime. Otherwise, your jobs will wait in the queue until Savio is back online.
Savio Scratch File System is Back up and Running: Mon., 7/1
The Savio /global/scratch parallel file system is back up and running and usable again.
Savio Scratch File System is Down: Mon, 6/24
The /global/scratch parallel file system started having access problems on Friday, 6/21. The investigation is underway. We apologize for any inconvenience this may cause and will keep you posted about the investigation's status.
News Articles
April 2024 Newsletter
Welcome to the special Spring 2024 issue of the Research IT newsletter. We hope you learn about some of the many ways we support research across the UC Berkeley campus and beyond. In this issue, we are highlighting our consulting program as it is one of the primary ways we...Read more about April 2024 Newsletter
Cybersecurity for Researchers
Tuesday, October 22, 2024
1pm via Zoom
Your research is important and we are here to help keep it safe and secure. This brown bag session will focus on secure campus tools and services that Research IT and Berkeley IT offer to researchers, tips...Read more about Cybersecurity for Researchers
Empowering Researchers, Fueling Innovation: UC Berkeley’s Research IT Consulting Team Bridges the Gap Between Cutting-Edge Technology and Researchers
The UC Berkeley Research IT Consulting team bridges the gap between technology and people, serving as the first point of contact for UC Berkeley researchers seeking to leverage computing and data resources. This diverse group of experts includes Research IT and Lawrence Berkeley...Read more about Empowering Researchers, Fueling Innovation: UC Berkeley’s Research IT Consulting Team Bridges the Gap Between Cutting-Edge Technology and Researchers