BRC Savio will be shutdown on Friday Aug 2 after 5pm to accommodate electrical work in the data center. Savio will be brought back online first thing on Monday morning Aug 5.
BRC Savio Cluster shutdown planned for the weekend of Aug 3
BRC cluster downtime planned for 8/6-8/7
BRC staff have made arrangements with the vendor to perform an upgrade of our Lustre file storage system on August 6th - 7th, which was unable to take place during our most recent scheduled downtime.
If you have questions or concerns, please contact us at firstname.lastname@example.org.
Scheduled downtime 7/24-7/25
Our next maintenance downtime for the BRC HPC Supercluster is scheduled for July 24th and 25th. It will be a two day downtime starting from 8:00 am on Tuesday till 5:00 pm on Wednesday.
We need to do some long pending maintenance tasks and improvements to the scratch filesystem which will help us manage it better.
All access to cluster login nodes, data transfer node, scheduler queues and data on all the cluster filesystems will be blocked. This downtime impacts all the three clusters, Savio, Cortex & Vector in the supercluster infrastructure. After the downtime, access will be restored as before.
We have scheduler reservations put in place such that there will not be any jobs running after 8:00 am on July 24th. So if you are submitting jobs to any cluster queues before the downtime please make sure you request proper wallclock time such that they finish running before 8:00 am on 24th or else your jobs will wait in the queue until after the downtime.
(Resolved) Job submission errors on Savio
[Update 9:30 AM: This issue should be resolved. Please contact us at email@example.com if you continue experiencing problems.]
Since 1:30 AM on 7/17/18, users have been reporting issues with job submission on Savio. Staff are investigating the problem and hope to restore service soon.
[Resolved] Ongoing scratch and DTN issues
As of 11:30 PM on 6/4/18, the scratch and DTN issues should be resolved. Please contact firstname.lastname@example.org if you encounter further issues.
BRC cluster users are continuing to report issues with scratch storage and DTN access. Support staff are currently working on the issue, and will post an update when a fix is in place, or we have an ETA for a fix.
Scratch storage issue on BRC clusters
Starting Sunday afternoon (6/3/18), users have been reporting issues with scratch storage on BRC clusters. Cluster sysadmins will look into it as soon as possible.
Scratch storage issue on BRC clusters
Beginning around 1 PM today, BRC clusters began experiencing issues with scratch storage, where any attempt to access the filesystem might cause it to freeze. BRC staff are currently working to restore service.
(Resolved) Login problems on Savio
Update: The login problems, which were caused by a storage issue, have now been resolved. Please email email@example.com if the issue reoccurs for you.
Since around midnight on 5/10/18, users have been reporting problems with logging into Savio, including the DTN. The systems team is currently looking into the issue.
Jupyterhub on Savio currently unavailable
Since 4/27/18, Jupyterhub on Savio has experienced a number of outages. The systems team is investigating and will restore service as soon as possible.
Emergency downtime for BRC clusters
We are currently undergoing an emergency downtime from 9-12 on 4/17 to address recent scratch storage issues.
Users should receive a notification when the system is back online. If you have any concerns in the meantime, please email firstname.lastname@example.org.
Savio scratch file creation issues
Update: (3/15/18, 4:30 PM) With help from users with high file counts, we are continuing to work towards stabilizing scratch, but users may continue to experience sporadic issues through tomorrow. Deleting unused files is still helpful, if possible.
Since 10 AM on 3/15/18, we have been experiencing some issues with Savio scratch, where users may be unable to create new files. BRC staff are working on resolving the problem, but deleting unused files will help us restore access more quickly. We will continue to update users, but if you have specific concerns, please email email@example.com.
Scratch filesystem returning to normal
Thanks to the quick assistance of a number of top scratch storage users, scratch should be available for use again. If you continue to experience errors, please contact us at firstname.lastname@example.org.
Read-only scratch filesystem
Scratch storage on the BRC clusters is currently read-only due to a space issue as of 7:50 PM on 1/6/18. The systems team is actively working on a resolution, but currently no new files can be created on scratch storage.
Scratch filesystem instability
We are currently experiencing some ongoing instability with the BRC scratch filesystem and are working with the vendor to resolve it.
If you experience errors when writing to scratch, please wait a few minutes and try again and/or restart your job. You can also contact us at email@example.com with any issues or concerns.
(Resolved) Scratch storage issues for some users
Update: As of 10 PM the scratch issues appear to be resolved. Please email firstname.lastname@example.org if you experience further issues.
As of around 5:45 PM on 2/3/18, some users began experiencing issues with scratch storage on the BRC clusters, with errors like "no space left on device" or "Bad address". BRC staff are currently looking into the issue and will post an update when it's resolved.
[Resolved] Globus unavailable after upgrade
Update: Globus should now be available. Please deactivate any credentials for your savio brc endpoint and please try again.
BRC staff are working with Globus engineers to address issues with Globus following the SL7 update. We'll post an announcement once it's available again. Thank you for your patience; if you encounter other issues with the system, please let us know at email@example.com.
Scheduled cluster downtime 1/23
The BRC clusters (Savio, Vector, and Cortex) will be down on Tuesday, January 23rd for the Scientific Linux 7 OS upgrade. Please email firstname.lastname@example.org if you have questions or concerns.
(Resolved) Jupyterhub currently down
Update: the Jupyterhub node is back online as of 2 PM on 12/30/17.
As of 1:20 PM on 12/28/17, the Jupyterhub node is down. There may be a delay in getting it back online due to winter curtailment, but updates will be posted as they become available.
(Resolved) BRC clusters down for scheduled maintenance
As of 12:00 PM, all BRC clusters should be back online. Please email us at email@example.com if you encounter any issues.
The BRC clusters are undergoing a brief scheduled downtime to allow us to make some network configuration changes. We expect everything to be back online by noon today.
(Resolved) Jupyterhub on Savio currently unavailable
Update: Jupyterhub access has now been restored, as of 10:15 AM on 12/12/17.
At 7:30 PM on 12/11/17, Jupyterhub went down, and hasn't come back up after a restart. Systems staff are currently working on resolving the issue.