CGRL (Vector/Rosalind) User Guide
Account Requests | Logging in | Transferring Data | Storage and Backup | Hardware Configuration | Scheduler Configuration | Low Priority Jobs | Job Script Examples | Software Configuration | Getting Help
This is the user guide for Computational Genomics Resource Laboratory (CGRL) users. The CGRL provides access to two computing clusters collocated within the larger Savio system administered by Berkeley Research Computing at the University of California, Berkeley. Vector is a heterogeneous cluster which is accessed through the Savio login nodes, but is independent from the rest of Savio and exclusively used by the CGRL. Rosalind is a condominium of identical nodes within Savio. Through the condo model of access, CGRL users can utilize a number of Savio nodes equal to those contributed by Rosalind.
Account Requests
You can request new accounts through the Computational Genomics Resource Laboratory (CGRL) by filling out this form and emailing it to cgrl@berkeley.edu.
Logging in
Vector and Rosalind (Savio) use One Time Passwords (OTPs) for login authentication. For details, please see Logging into BRC.
Use SSH to log into: hpc.brc.berkeley.edu
Transferring Data
To transfer data to/from or between Vector and Rosalind (Savio), use the dedicated Data Transfer Node (DTN): dtn.brc.berkeley.edu
If you're using Globus to transfer files, the Globus endpoint is: ucb#brc
For details about how to transfer files to and from the cluster, please see Transferring Data.
Storage and Backup
The following storage systems are available to CGRL users. For running jobs, compute nodes within a cluster can only directly access the storage as listed below. The DTN can be used to transfer data between the locations accessible to only one cluster or the other, as detailed in the previous section.
Name | Cluster | Location | Quota | Backup | Allocation | Description |
---|---|---|---|---|---|---|
Home | Both | /global/home/users/$USER |
10 GB | Yes | Per User | Home directory ($HOME) for permanent data |
Scratch | Vector | / clusterfs/vector/scratch/$USER |
none | No | Per User | Short-term, large-scale storage for computing |
Group | /clusterfs/vector/instrumentData/ | 300 GB | No | Per Group | Group-shared storage for computing | |
Scratch | Rosalind (Savio) | /global/scratch/$USER |
none | No | Per User | Short-term, large-scale Lustre storage for very high-performance computing |
Condo User | /clusterfs/rosalind/users/$USER |
none | No | Per User | Long-term, large-scale user storage | |
Condo Group | /clusterfs/rosalind/groups/ | none | No | Per Group | Long-term, large-scale group-shared storage |
Hardware Configuration
Vector and Rosalind are heterogeneous, with a mix of several different types of nodes. Please be aware of these various hardware configurations, along with their associated scheduler configurations, when specifying options for running your jobs.
Cluster | Nodes | Node List | CPU | Cores/Node | Memory/Node | Scheduler Allocation |
---|---|---|---|---|---|---|
Vector | 11 | n00[00-03].vector0 | Intel Xeon X5650, 2.66 GHz | 12 | 96 GB | By Core |
n0004.vector0 | AMD Opteron 6176, 2.3 GHz | 48 | 256 GB | By Core | ||
n00[05-08].vector0 | Intel Xeon E5-2670, 2.60 GHz | 16 | 128 GB | By Core | ||
n00[09]-n00[10].vector0 | Intel Xeon X5650, 2.66 GHz | 12 | 48 GB | By Core | ||
Rosalind (Savio1) | 8 | floating condo within:
n0[000-095].savio1, n0[100-167].savio1 |
Intel Xeon E5-2670 v2, 2.50 GHz | 20 | 64 GB | By Node |
Rosalind (Savio2 HTC) | 8 | floating condo within:
n0[000-011].savio2, n0[215-222].savio2 |
Intel Xeon E5-2643 v3, 3.40 GHz | 12 | 128 GB | By Core |
Scheduler Configuration
The clusters uses the SLURM scheduler to manage jobs. When submitting your jobs via sbatch
or srun
commands, use the following SLURM options:
NOTE: To check which accounts and QoS you are allowed to use, simply run "sacctmgr -p show associations user=$USER"
Partition | Account | Nodes | Node List | Node Feature | QoS | QoS Limit |
---|---|---|---|---|---|---|
vector | see below | 11 | n00[00-03].vector0 | vector,vector_c12,vector_m96 | vector_batch | 48 cores max per job
96 cores max per user |
n0004.vector0 | vector,vector_c48,vector_m256 | |||||
n00[05-08].vector0 | vector,vector_c16,vector_m128 | |||||
n00[09]-n00[10].vector0 | vector,vector_c12,vector_m48 | |||||
savio | co_rosalind | 8 | n0[000-095].savio1, n0[100-167].savio1 | savio | rosalind_savio_normal | 8 nodes max per group |
savio2_htc | co_rosalind | 8 | n0[000-011].savio2, n0[215-222].savio2 | savio2_htc | rosalind_htc2_normal | 8 nodes max per group |
- The settings for a job in Vector:
--partition=vector --qos=vector_batch
- The account for Vector jobs is unique for each lab group (e.g. "vector_doelab"). Find your correct account with "sacctmgr -p show associations user=$USER"
- The settings for a job in Rosalind (Savio1):
--partition=savio
--account=co_rosalind--qos=
rosalind_savio_normal - The settings for a job in Rosalind (Savio2 HTC):
--partition=savio2_htc
--account=co_rosalind--qos=
rosalind_htc2_normal
Low Priority Jobs
As a condo contributor, are entitled to use the extra resource that is available on the SAVIO cluster (across all partitions). This is done through a low priority QoS "savio_lowprio" and your account is automatically subscribed to this QoS during the account creation stage. You do not need to request for it explicitly. By using this QoS you are no longer limited by your condo size. What this means to users is that you will now have access to the broader compute resource which is limited by the size of partitions. However this QoS does not get a priority as high as the general QoSs, such as "savio_normal" and "savio_debug", or all the condo QoSs, and it is subject to preemption when all the other QoSs become busy. Thus it has two implications:
- When system is busy, any job that is submitted with this QoS will be pending and yield to other jobs with higher priorities.
- When system is busy and there are higher priority jobs pending, scheduler will preempt jobs that are running with this lower priority QoS. Preempted jobs can choose whether the job should be simply killed, or be automatically requeued after it's killed, at submission time. Please note that, since preemption could happen at any time, it would be very beneficial if your job is capable of checkpointing/restarting by itself, when you choose to requeue the job. Otherwise, you may need to verify data integrity manually before you want to run the job again.
Job Script Examples
For many examples of job script files that you can adapt and use for running your own jobs, please see Running Your Jobs.
Software Configuration
For details about how to find and access the software provided on the cluster, as well as on how to install your own, please see Accessing and Installing Software. CGRL users have access to the CGRL module farm of bioinformatics software (/clusterfs/vector/home/groups/software/sl-7.x86_64/modfiles), as well as the other module farms on Savio.
Getting Help
For inquiries or service requests regarding the cluster systems, please see BRC's Getting Help page or send email to brc-hpc-help@berkeley.edu.
For questions about new accounts or installing new biology software please contact the Computational Genomics Resource Laboratory (CGRL) by emailing cgrl@berkeley.edu.