Account Requests | Logging in | Transferring Data | Storage and Backup | Hardware Configuration | Scheduler Configuration | Low Priority Jobs | Job Script Examples | Software Configuration | Getting Help
This is the User Guide for Vector and Rosalind (Savio), computing clusters administered by the Computational Genomics Resource Laboratory (CGRL) and Berkeley Research Computing at the University of California, Berkeley.
Vector and Rosalind (Savio) use One Time Passwords (OTPs) for login authentication. For details, please see Logging into BRC.
Use SSH to log into:
To transfer data to/from or between Vector and Rosalind (Savio), use the dedicated Data Transfer Node (DTN):
If you're using Globus to transfer files, the Globus endpoint is:
For details about how to transfer files to and from the cluster, please see Transferring Data.
Storage and Backup
The following storage systems are available to CGRL users. For running jobs, compute nodes within a cluster can only directly access the storage as listed below. The DTN can be used to transfer data between the locations accessible to only one cluster or the other, as detailed in the previous section.
||10 GB||Yes||Per User||Home directory ($HOME) for permanent data|
||none||No||Per User||Short-term, large-scale storage for computing|
|Group||/clusterfs/vector/instrumentData/||300 GB||No||Per Group||Group-shared storage for computing|
||none||No||Per User||Short-term, large-scale Lustre storage for very high-performance computing|
||none||No||Per User||Long-term, large-scale user storage|
|Condo Group||/clusterfs/rosalind/groups/||none||No||Per Group||Long-term, large-scale group-shared storage|
Vector and Rosalind are heterogeneous, with a mix of several different types of nodes. Please be aware of these various hardware configurations, along with their associated scheduler configurations, when specifying options for running your jobs.
|Cluster||Nodes||Node List||CPU||Cores/Node||Memory/Node||Scheduler Allocation|
|Vector||11||n00[00-03].vector0||Intel Xeon X5650, 2.66 GHz||12||96 GB||By Core|
|n0004.vector0||AMD Opteron 6176, 2.3 GHz||48||256 GB||By Core|
|n00[05-08].vector0||Intel Xeon E5-2670, 2.60 GHz||16||128 GB||By Core|
|n00-n00.vector0||Intel Xeon X5650, 2.66 GHz||12||48 GB||By Core|
floating condo within:
|Intel Xeon E5-2670 v2, 2.50 GHz||20||64 GB||By Node|
|Rosalind (Savio2 HTC)||8||
floating condo within:
|Intel Xeon E5-2643 v3, 3.40 GHz||12||128 GB||By Core|
The clusters uses the SLURM scheduler to manage jobs. When submitting your jobs via
srun commands, use the following SLURM options:
NOTE: To check which QoS you are allowed to use, simply run "sacctmgr -p show associations user=$USER"
|Partition||Account||Nodes||Node List||Node Feature||QoS||QoS Limit|
48 cores max per job
96 cores max per user
|savio||rosalind_savio_normal||8 nodes max per group|
|savio2_htc||rosalind_htc2_normal||8 nodes max per group|
- The settings for a job in Vector (Note: you don't need to set the "account"):
- The settings for a job in Rosalind (Savio1):
- The settings for a job in Rosalind (Savio2 HTC):
Low Priority Jobs
As a condo contributor, are entitled to use the extra resource that is available on the SAVIO cluster (across all partitions). The is done through a low priority QoS "savio_lowprio" and your account is automatically subscribed to this QoS during the account creation stage. You do not need to request for it explicitly. By using this QoS you are no longer limited by your condo size. What this means to users is that you will now have access to the broader compute resource which is limited by the size of partitions. However this QoS does not get a priority as high as the general QoSs, such as "savio_normal" and "savio_debug", or all the condo QoSs, and it is subject to preemption when all the other QoSs become busy. Thus it has two implications:
- When system is busy, any job that is submitted with this QoS will be pending and yield to other jobs with higher priorities.
- When system is busy and there are higher priority jobs pending, scheduler will preempt jobs that are running with this lower priority QoS. Preempted jobs can choose whether the job should be simply killed, or be automatically requeued after it's killed, at submission time. Please note that, since preemption could happen at any time, it would be very beneficial if your job is capable of checkpointing/restarting by itself, when you choose to requeue the job. Otherwise, you may need to verify data integrity manually before you want to run the job again.
Job Script Examples
For many examples of job script files that you can adapt and use for running your own jobs, please see Running Your Jobs.
For details about how to find and access the software provided on the cluster, as well as on how to install your own, please see Accessing and Installing Software.
For questions about computational biology, new accounts, or installing new biology software, please contact the Computational Genomics Resource Laboratory (CGRL) by emailing firstname.lastname@example.org and/or email@example.com.