Savio is now back online after the power outage. The SLURM job queue might have got flushed during the power outage, all running jobs might have failed and jobs waiting in the queue might be lost by the scheduler. Please look for your jobs and resubmit them back to the queue.
Savio back online after power outage
Brief power outage impacting Savio
There was a brief power outage in the data center around 2:25 PM on Tuesday, June 13th. Savio nodes are currently coming back online, and users will not be able to log into Savio until the login nodes are restored.
DNS issue resolved
As of early this morning, the DNS issues with hpc.brc.berkeley.edu have been fixed. You should be able to connect to it normally now. If you experience any issues, please email email@example.com.
DNS issue - hpc.brc.berkeley.edu not available
We are currently experiencing a DNS issue with hpc.brc.berkeley.edu. We hope to restore access soon. Scheduled jobs will continue to run in the meantime, and dtn.brc.berkeley.edu is still available.
Update 10:40 PM: We are in contact with the network team and anticipate that access should be restored by tomorrow (Thursday) morning.
Savio, Cortex, and Vector back online after scheduled maintenance
Scheduled maintenance is now complete on the Savio, Cortex, and Vector clusters. If you encounter any issues, please contact us at firstname.lastname@example.org.
Savio, Vector, Cortex currently undergoing planned maintenance
The Savio, Vector, and Cortex clusters are currently down for planned maintenance from 9 AM on Tuesday, May 16th until 5 PM on Wednesday, May 17th. During the maintenance period, we are upgrading and expanding storage, performing an OS/VNFS update, and doing a Slurm update. If you have any questions or concerns, please email email@example.com.
Savio, Vector, Cortex planned downtime May 16-17
BRC Supercluster infrastructure with all its clusters - Cortex, Vector, Savio and all associated condos will be unavailable for a scheduled maintenance for two days on Tuesday, May 16th and Wednesday May 17th. We are planning to do some storage upgrades during this downtime. Access to the login/front-end nodes, compute nodes of all the three clusters, scheduler queues and data on the filesystems will be blocked starting from 9:00 am on Tuesday, May 16th until 5:00 pm on Wednesday, May 17th.
If you are submitting jobs to any of the cluster partitions please choose proper wallclock times such that jobs finish running before 9:00 am on Tuesday, May 16th or else your jobs will stay in the queue waiting for the downtime to finish.
Email us at firstname.lastname@example.org if you have any questions or concerns with this schedule.