Research IT

provides research data and computing technologies, consulting, and community for the UC Berkeley campus. Our goal is to advance research through IT innovation.

Status and Service Updates

Tues, 2/10: Savio update to general availability

We are excited to announce that the /global/scratch file system is now available. The reservation is removed (no action needed in Slurm) and the temporary Slurm reservation used during the phased onboarding has been removed. You no longer need to include a reservation in your job scripts—please now submit jobs normally. We will resume processing new user account requests, but will do so in batches, slowly and steadily, to ensure the system remains stable as usage ramps up.

Mon, 2/9/2026: Savio recovery update

We are pleased that the /global/scratch file system is no longer in a critical state. However, elements of the storage system remain fragile at >95% utilization, leaving little operational headroom. System health metrics are trending strongly in the right direction. We have successfully onboarded the majority of users who requested early access, and the system is demonstrating excellent stability. To request immediate access to the system, please see our recent emails. Please note that at this time we are unable to create new user accounts.

Fri, 2/6: Savio Scratch Filesystem Update

We are pleased to report continued progress in our file system migration and ongoing phased user onboarding. The migration metrics indicate improvement, with both inode utilization and OST occupancy trending in the right direction. We are encouraged by the stability we are observing during this process. We continue to invite users with minimal I/O requirements and urgent access needs to return to the cluster in a controlled, phased manner. We will continue to expand user access incrementally as the file system metrics remain stable and approach our operational targets (all inode utilization below 85% and all OSTs below 90%).

Thurs, 2/5: Savio Scratch Filesystem Update

Our Lustre file migration between the back-end targets is currently in progress. Our teams are actively migrating data to better balance utilization across targets in order to improve overall stability and performance. This migration process uses standard, vendor-recommended procedures designed to preserve data integrity, and we are continuously monitoring and validating the results. To share our current status: inode utilization has been reduced to 90%, and OST occupancy has also been reduced; however, six OSTs remain above 90% utilization. Before we can safely return the cluster to full availability, our operational targets are to bring inode utilization below 85% and reduce all OSTs to below 90%.

Tues, 2/3: Savio Access Request During Recovery Period

As our inode reduction efforts have made good progress, we are continuing to work on balancing the backend storage usage of the Lustre scratch file system. The cluster is now transitioning into recovery mode. In the first stage during this recovery period, we plan to grant limited access based on research needs and urgency. Approved access will allow users to run jobs and read/write files on scratch. If you need access during this period, please fill out the Google Form which we sent via email.

Mon, 2/2: Update on Scratch File System

We have made measurable progress in reducing inode usage, and the file system is no longer at risk of entering a critical state. At this time, the system remains in recovery mode, and mitigation work is still in progress to restore sufficient storage capacity. We expect to have additional details and updated metrics to share with you by noon tomorrow. We sincerely appreciate your patience and understanding as we continue this work. If you have any questions or require assistance, please contact us.

Mon, 1/26: Savio scratch filesystem update

Our Lustre file system vendor (DDN) is still working with us to free up and balance the storage targets on the file system. Thank you for your patience.

Fri. 1/23: Scratch Unavailable on Savio

The scratch file system is completely unavailable on the cluster (login/data transfer/compute nodes) at the moment. We are investigating the issue and have contacted our vendor for troubleshooting. We will update you on the status when we have more information. We apologize for the inconvenience. Please contact us if you have any further questions or concerns.

News Articles