Savio User Guide

Login | Data TransferStorage | Hardware |  Scheduler | Software

Login Procedure

The SAVIO cluster uses One Time Passwords (OTP) for login authentication. You will need to download and configure the Google Authenticator application on a tablet, and/or smart phone (Android or iOS) to generate these one time passwords. Please see Logging into Savio for instructions on installing and configuring Google Authenticator.

  • Login Protocol: ssh
  • Login Server: hpc.brc.berkeley.edu

Data Transfer

To transfer data to or from the SAVIO cluster, connect to the cluster's Data Transfer Node. For instructions, please see Transferring Data.

  • Data Transfer Node: dtn.brc.berkeley.edu
  • Globus Endpoint Name: ucb#brc

Storage and Backup

SAVIO cluster users are entitled to access the following storage systems. Please be sure to use these filesystems correctly. Users should not use their HOME directory for heavy I/O activity during job runs.

Name Location Quota Backup Allocation Description
HOME /global/home/users/ 10 GB Yes Per User HOME directory for permanent data
GROUP /global/home/groups/ 30/200 GB No Per Group GROUP directory for shared data (30 GB for FCA, 200 GB for Condo)
SCRATCH /global/scratch/ none No Per User SCRATCH directory with Lustre FS

Hardware Configuration

Please refer to the following table with the hardware configuration of each generation of nodes. A high performance Lustre file system is also available as scratch space to all users.

Partition Nodes Node List CPU Model # Cores/Node Memory/Node Infiniband Speciality Scheduler Allocation
savio 164 n0[000-095].savio1
n0[100-167].savio1
Intel Xeon E5-2670 v2 20 64 GB FDR - By Node
savio_bigmem 4 n0[096-099].savio1 Intel Xeon E5-2670 v2 20 512 GB FDR BIGMEM By Node
savio2 136 n0[027-162].savio2 Intel Xeon E5-2670 v3 24 64 GB FDR - By Node
savio2 4 n0[183-186].savio2 Intel Xeon E5-2680 v3 24 64 GB FDR - By Node
savio2 16 n0[187-202].savio2 Intel Xeon E5-2680 v4 28 64 GB FDR - By Node
savio2_bigmem 20 n0[163-182].savio2 Intel Xeon E5-2670 v3 24 128 GB FDR - By Node
savio2_gpu 17 n0[012-026].savio2
n0[223-224].savio2
Intel Xeon E5-2623 v3 8 64 GB FDR 4x Nvidia K80 By Core
savio2_htc 20 n0[000-011].savio2
n0[215-222].savio2
Intel Xeon E5-2643 v3 12 128 GB FDR HTC By Core
savio2_knl 28 n0[254-281].savio2 Intel Xeon Phi 7210 64 188 GB FDR Intel Phi By Node

Scheduler Configuration

The SAVIO cluster uses SLURM as the scheduler to manage jobs on the cluster. Please see Running Your Jobs for instructions on using the scheduler, as well as taking note of crucial additional details, below.

Three different models are supported when running jobs on the Savio cluster through the scheduler:

  • Condo model - Faculty and principal investigators can join the Condo program by purchasing compute nodes which are contributed to the cluster. Users from the condo group are granted ongoing, no cost use of compute resources for running their jobs, up to the amount contributed. Condo users can also access larger amounts of compute resources at no cost via Savio's preemptable, low-priority quality of service option.
  • Faculty Computing Allowance (FCA) model - This program provides qualified faculty and principal investigators with up to 300K Service Units on the SAVIO cluster at no cost, during each annual application period. (Allowances are prorated based on month of application during the allowance year.) Please see Faculty Computing Allowance for more details.
  • Memorandum of Understanding (MOU) model - If you use your entire Faculty Computing Allowance for a year and need additional compute time on Savio, please contact us at brc-hpc-help@berkeley.edu. We can help you set up an MOU, through which you can pay for additional blocks of compute time at a rate that is highly competitive with those of other compute providers.

Depending on the specific group(s) to which a user belongs, s/he may have access to one or more of the models described above. Therefore, it is highly recommended that all users become familiar with the following scheduler configuration on the SAVIO cluster to use its resources in an efficient manner:

  • The partition name “savio” or “savio_bigmem” (“--partition=savio” or “--partition=savio_bigmem”) is needed in all cases.
  • The designated account that was assigned to the user is needed in all cases (e.g., --account=fc_{PUT_YOUR_ACCOUNT_HERE}).
  • The appropriate Quality of Service (QoS) selection is also required; the specific QoS which applies to each user will depend on his/her project(s) and the model under which the job should run:
    • To run jobs with the condo model, the proper condo QoS (e.g., --qos={PUT_YOUR_CONDO_NAME_HERE}_normal) should be used.
    • To run jobs with the FCA model, the proper FCA QoS (e.g., --qos=savio_normal) should be used.
    • To run jobs with the MOU model, the proper MOU QoS (e.g., --qos=savio_normal, or --qos=savio_debug) should be used.
  • A standard fair-share policy with a decay half-life of 14 days (2 weeks) is enforced.
Configuration Details
Partition Nodes Node
Features
Shared SU/CORE Hour
Ratio
Account QoS QoS Limit
savio 164 savio Exclusive 0.75
ac_*
fc_*
pc_*
co_*
savio_debug
savio_normal
Condo QoS
savio_lowprio
4 nodes max per job
4 nodes in total
00:30:00 wallclock limit
24 nodes max per job
72:00:00 wallclock limit
Savio Condo QoS Conf
24 nodes max per job
72:00:00 wallclock limit
savio_
bigmem
4 savio_
bigmem
or
savio_
m512
Exclusive 1.67
ac_*
fc_*
pc_*
co_*
savio_debug
savio_normal
Condo QoS
savio_lowprio
4 nodes max per job
4 nodes in total
00:30:00 wallclock limit
24 nodes max per job
72:00:00 wallclock limit
Savio Bigmem Condo QoS Conf
24 nodes max per job
72:00:00 wallclock limit
savio2 156 savio2
or
savio2_
c24
or
savio2_
c28
Exclusive 1.00
ac_*
fc_*
pc_*
co_*
savio_debug
savio_normal
Condo QoS
savio_lowprio
4 nodes max per job
4 nodes in total
00:30:00 wallclock limit
24 nodes max per job
72:00:00 wallclock limit
Savio2 Condo QoS Conf
24 nodes max per job
72:00:00 wallclock limit
savio2_
bigmem
20 savio2_
bigmem
or
savio2_
m128
Exclusive 1.20
ac_*
fc_*
pc_*
co_*
savio_debug
savio_normal
Condo QoS
savio_lowprio
4 nodes max per job
4 nodes in total
00:30:00 wallclock limit
24 nodes max per job
72:00:00 wallclock limit
Savio2 Bigmem Condo QoS Conf
24 nodes max per job
72:00:00 wallclock limit
savio2_
gpu
17 savio2_
gpu
Shared 2.67
ac_*
fc_*
pc_*
co_*
savio_debug
savio_normal
Condo QoS
savio_lowprio
4 nodes max per job
4 nodes in total
00:30:00 wallclock limit
24 nodes max per job
72:00:00 wallclock limit
Savio2 GPU Condo QoS Conf
24 nodes max per job
72:00:00 wallclock limit
savio2_
htc
20 savio2_
htc
Shared 1.20
ac_*
fc_*
pc_*
co_*
savio_debug
savio_normal
Condo QoS
savio_lowprio
4 nodes max per job
4 nodes in total
00:30:00 wallclock limit
24 nodes max per job
72:00:00 wallclock limit
Savio2 HTC Condo QoS Conf
24 nodes max per job
72:00:00 wallclock limit
savio2_
knl
28 savio2_
knl
Exclusive 1.10
ac_*
fc_*
pc_*
co_*
savio_debug
savio_normal
Condo QoS
savio_lowprio
4 nodes max per job
4 nodes in total
00:30:00 wallclock limit
24 nodes max per job
72:00:00 wallclock limit
Savio2 KNL Condo QoS Conf
24 nodes max per job
72:00:00 wallclock limit

NOTE: To check which account and/or QoS that a user is allowed to use, please run "sacctmgr -p show associations user=$USER".

Savio Condo QoS Configurations
Account QoS QoS Limit
co_acrb acrb_savio_normal 8 nodes max per group
co_aiolos aiolos_savio_normal 12 nodes max per group
24:00:00 wallclock limit
co_astro
astro_savio_debug
astro_savio_normal
4 nodes max per group
4 nodes max per job
00:30:00 wallclock limit
32 nodes max per group
16 nodes max per job
co_dlab dlab_savio_normal 4 nodes max per group
co_nuclear nuclear_savio_normal 24 nodes max per group
co_praxis praxis_savio_normal 4 nodes max per group
co_rosalind rosalind_savio_normal 8 nodes max per group
4 nodes max per job per user
Savio Bigmem Condo QoS Configurations

 

Savio2 HTC Condo QoS Configurations
Account QoS QoS Limit
co_rosalind rosalind_htc2_normal 8 nodes max per group
Savio2 GPU Condo QoS Configurations
Account QoS QoS Limit
co_acrb acrb_gpu2_normal 44 GPUs max per group
co_stat stat_gpu2_normal 8 GPUs max per group
Savio2 Condo QoS Configurations
Account QoS QoS Limit
co_biostat biostat_savio2_normal 20 nodes max per group
co_chemqmc chemqmc_savio2_normal 16 nodes max per group
co_dweisz dweisz_savio2_normal 8 nodes max per group
co_econ econ_savio2_normal 2 nodes max per group
co_hiawatha hiawatha_savio2_normal 40 nodes max per group
co_lihep lihep_savio2_normal 4 nodes max per group
co_mrirlab mrirlab_savio2_normal 4 nodes max per group
co_planets planets_savio2_normal 4 nodes max per group
co_stat stat_savio2_normal 2 nodes max per group
co_bachtrog bachtrog_savio2_normal 4 nodes max per group
co_noneq noneq_savio2_normal 8 nodes max per group
co_kranthi kranthi_savio2_normal 4 nodes max per group
Savio2 Bigmem Condo QoS Configurations
Account QoS QoS Limit
co_laika laika_bigmem2_normal 4 nodes max per group
co_dweisz dweisz_bigmem2_normal 4 nodes max per group
co_aiolos aiolos_bigmem2_normal 4 nodes max per group
co_bachtrog bachtrog_bigmem2_normal 4 nodes max per group
co_msedcc msedcc_bigmem2_normal 4 nodes max per group
Savio2 KNL Condo QoS Configurations
Account QoS QoS Limit
co_lsdi lsdi_knl2_normal 28 nodes max per group
Low Priority Jobs

All condo contributors (account name starts with "co_") are entitled to use the extra resource that is available on the SAVIO cluster (across all partitions). The is done through a low priority QoS "savio_lowprio" and your account is automatically subscribed to this QoS during the account creation stage. You do not need to request for it explicitly. By using this QoS you are no longer limited by your condo size. What this means to users is that you will now have access to the broader compute resource which is limited by the size of partitions. However this QoS does not get a priority as high as the general QoSs, such as "savio_normal" and "savio_debug", or all the condo QoSs, and it is subject to preemption when all the other QoSs become busy. Thus it has two implications:

  1. When system is busy, any job that is submitted with this QoS will be pending and yield to other jobs with higher priorities.
  2. When system is busy and there are higher priority jobs pending, scheduler will preempt jobs that are running with this lower priority QoS. Preempted jobs can choose whether the job should be simply killed, or be automatically requeued after it's killed, at submission time. Please note that, since preemption could happen at any time, it would be very beneficial if your job is capable of checkpointing/restarting by itself, when you choose to requeue the job. Otherwise, you may need to verify data integrity manually before you want to run the job again.
Job Script Examples

For many examples of job script files that you can adapt and use for running your own jobs, please see Running Your Jobs.

Software Configuration

The SAVIO cluster uses Environment Modules to manage the cluster-wide software installation.