Q. How can I access the Faculty Computing Allowance application (Requirements Survey) and Additional User Request forms? I'm seeing a "You need permission" error.
A. Starting in late March 2016, you will now need to authenticate via CalNet to access the online forms for applying for a Faculty Computing Allowance, and for requesting additional user accounts on the Savio cluster.
When accessing either form, you may encounter the error message, "You need permission. This form can only be viewed by users in the owner's organization", under either of these circumstances:
1. If you haven't already successfully logged in via CalNet. (If you don't have a CalNet ID, please work with a UCB faculty member or other researcher who can access the form on your behalf.)
2. If you've logged in via CalNet, but you're also simultaneously connected, in your browser, to a non-UCB Google account; for instance, to access a personal Gmail account. (If so, the easiest way to access the online forms might be to use a private/incognito window in your primary browser, or else use a second browser on your computer, one in which you aren't already logged into a Google account. As an alternative, you can first log out of all of your Google accounts in your primary browser, before attempting to access these forms.)
A. SLURM provides a command you can run to check on the partitions, accounts and Quality of Service (QoS) options that you're permitted to use. Please run the "sacctmgr show associations user=$USER" command to find this information for your job submission. You can also add the "-p" option to this command to get a parsable output, i.e., "sacctmgr -p show associations user=$USER".
A. Savio provides a "check_usage.sh" command line tool you can use to check cluster usage by user or account.
Running "check_usage.sh -E" will report total usage by the current user, as well as a breakdown of their usage within each of their related project accounts, since the most recent reset/introduction date (normally June 1st of each year). To check usage for another user on the system, add a "-u sampleusername" option (substituting an actual user name for 'sampleusername' in this example).
You can check usage for a project's account, rather than for an individual user's account, with the '-a sampleprojectname' option to this command (substituting an actual account name for 'sampleprojectname' in this example).
Also, when checking usage for either users or accounts, you can display usage during a specified time period by adding start date (-s) and/or end date (-e) options, as in "-s YYYY-MM-DD" and "-e YYYY-MM-DD" (substituting actual Year-Month-Day values for 'YYYY-MM-DD' in these examples). Run "check_usage.sh -h" for more information and additional options.
When checking usage for accounts that have overall usage limits (such as Faculty Computing Allowances), the value of the Service Units (SUs) field is color-coded to help you see at a glance how much computational time is still available: green means your project has used less than 50% of its available SUs; yellow means your project has used more than 50% but less than 100% of its available SUs; and red means your project has used 100% or more of its available SUs (and has likely been disabled). Note that if you specify the starttime and/or endtime with "-s" and/or "-e" option(s) you will not get the color coded output.
A couple of output samples from running this command line tool with user and project options, respectively, along with some tips on interpreting that output:
$ check_usage.sh -E -u sampleusername
Usage for USER sampleusername [2016-06-01T00:00:00, 2016-08-17T18:18:37]: 38 jobs,
1311.40 CPUHrs, 1208.16 SUs
Usage for USER sampleusername in ACCOUNT co_samplecondoname [2016-06-01T00:00:00,
2016-08-17T18:18:37]: 23 jobs, 857.72 CPUHrs, 827.59 SUs
Usage for USER sampleusername in ACCOUNT fc_sampleprojectname [2016-06-01T00:00:00,
2016-08-17T18:18:37]: 15 jobs, 453.68 CPUHrs, 380.57 SUs
Total usage from June 1, 2016 through the early evening of August 17, 2016 by the 'sampleusername' cluster user consists of 38 jobs run, using approximately 1,311 CPU hours, and resulting in usage of approximately 1208 Service Units. (The total number of Service Units is less than the total number of CPU hours in this example, because some jobs were run on older or otherwise less expensive hardware pools (partitions) which cost less than one Service Unit per CPU hour.)
Of that total usage, 23 jobs were run under the Condo project account 'co_samplecondoname', using approximately 858 CPU hours and 828 Service Units, and 15 jobs were run under the Faculty Computing Allowance project account 'fc_sampleprojectname', using approximately 454 CPU hours and 381 Service Units.
$ check_usage.sh -a fc_sampleprojectname
Usage for ACCOUNT fc_sampleprojectname [2016-06-01T00:00:00, 2016-08-17T18:19:15]: 156
jobs, 85263.80 CPUHrs, 92852.12 (300000) SUs
Usage from June 1, 2016 through the early evening of August 17, 2016 by all cluster users of the Faculty Computing Allowance account 'fc_sampleprojectname' consists of 156 jobs run, using a total of approximately 85,263 CPU hours, and resulting in usage of approximately 92,852 Service Units. (The total number of Service Units is greater than the total number of CPU hours in this example, because some jobs were run on hardware pools (partitions) which cost more than one Service Unit per CPU hour.) The total Faculty Computing Allowance allocation for this project's account is 300,000 Service Units, so there are approximately 207,148 Service Units still available for running jobs during the remainder of the current Allowance year (June 1 to May 31): 300,000 total Service Units granted, less 92,852 used to date. The total of 92,852 Service Units used to date is colored green, because this project's account has used less than 50% of its total Service Units available.
To also view individual usages by each cluster user of the Faculty Computing Allowance project account 'fc_sampleprojectname', you can add a '-E' option to the above command; e.g.,
check_usage.sh -E -a fc_sampleprojectname
Finally, if your Faculty Computing Allowance has become completely exhausted, the output from running the "check_usage.sh" command line tool will by default show only information for the period of time after your job scheduler account was disabled; for example:
Usage for ACCOUNT fc_sampleprojectname [2017-04-05T11:00:00, 2017-04-24T17:19:12]: 3 jobs, 0.00 CPUHrs, 0.00 (0) SUs
Usage for USER sampleusername in ACCOUNT fc_sampleprojectname [2017-04-05T11:00:00, 2017-04-24T17:19:12]: 0 jobs, 0.00 CPUHrs, 0.00 (0) SUs
To display the - more meaningful - information about the earlier usage that resulted in the Faculty Computing Allowance becoming exhausted, use the start date (-s) option and specify the most recently-passed June 1st - the first day of the current Allowance year - as that start date. E.g., to view usage for an Allowance that became exhausted anytime during the 2016-17 Allowance year, use a start date of June 1, 2016:
check_usage.sh -E -s 2016-06-01 -a fc_sampleprojectname
A. SLURM uses a different way to manage SLURM-specific environment variables, which is in turn different than PBS or other job schedulers. Before you use a SLURM environment variable, please check its scope of availability by entering "man sbatch" or "man srun".
To use the JOBID as part of the output and/or error file name, it takes a filename pattern instead of any of the environment variables. Please "man sbatch" for details. As a quick reference, the proper syntax is "--output=%j.out".
A. In most cases you should be able to get an estimate of when your job is going to start by running the command "squeue --start -j $your_job_id" (substituting your actual job ID for '$your_job_id' in this example). However under the following two circumstances you may get a "N/A" as the reported START_TIME.
- You did not specify the run time limit for the job with "--time" option. This will block the job from being back filled.
- The job that's currently running and blocking your job from starting didn't specify the run time limit with "--time" option.
Thus, as the best practice to improve the scheduler efficiency and to obtain a more accurate estimate of the start time, it is highly recommended to always use "--time" to specify a run time limit for your jobs. It is also worth noting that the start time is only an estimate based on the current jobs queued in the scheduler. If there are new jobs submitted later with higher priorities this estimate will be updated spontaneously as well. So this estimate should only be used as a reference instead of a guaranteed start time.
A. Lots of reasons could cause the job to not start at your expected time:
- The first action that you should take to troubleshoot it is to get an estimate of the start time with "squeue --start -j $your_job_id" (substituting your actual job ID for '$your_job_id' in this example). If you are satisfied with the estimated start time you can stop here.
- If you would like to troubleshoot further you can then run "sinfo -p $partition" (substituting the actual name of the partition on which you're trying to run your job, such as 'savio2' or 'savio2_gpu', for '$partition' in this example) to see if the resources you are requesting are currently allocated to other jobs or not. As a generality, "idle" nodes are free and available to run new jobs, while nodes in most other statuses are currently unavailable. If you used node features in your job submission file, make sure you only check the resources that match the node feature that you requested. (All the node features are documented on your cluster's webpage.) If the partition on which you want to run your job currently is heavily impacted (has few idle nodes), and you are not satisfied with your job's estimated start time (see above), you might consider running it on another, less-impacted partition to which you also have access, if that partition's features are compatible with your job.
- If you see enough resources available but your job is still not showing a reasonable start time, please run "sprio" to see if there are jobs with higher priorities that are currently blocking your job. (For instance, "sprio -o "%Y %i %u" | sort -n -r" will show pending jobs across all of Savio's partitions, ordered from highest to lowest priority, together with their job IDs and usernames.)
- If there are no higher priority jobs blocking the resources that you requested, you can check whether there may be any reservation on the resources that you requested with "scontrol show reservations". Reservations are used, for instance, to defer the start of jobs whose requested wall clock times might overlap with scheduled maintenance periods on the cluster. In those instances, if your job can be run within a shorter time period, you can adjust its wall clock time to avoid such overlap.
- For Faculty Computing Allowance users, you may also want to check whether your account's allowance has been completely used up, via "check_usage.sh". (See the Tip on using this command, above.)
If, after going through all of these steps, you are still puzzled by why your job is not starting, please feel free to contact us.
A. If you suspect your job is not running properly, or you simply want to understand how much memory or how much CPU the job is actually using on the compute nodes, RIT provides a script "wwall" to check that. "wwall -j $your_job_id" provides a snapshot of node status that the job is running on. "wwall -j $your_job_id -t" provides a text-based user interface (TUI) to monitor the node status when the job progresses. To exit the TUI, enter "q" to quit out of the interface and be returned to the command line.
Q. How can I run High-Throughput Computing (HTC) type of jobs? (How can I run multiple jobs on the same node?)
A. If you have a set of common tasks that you would like to perform on the cluster, and these tasks share the characteristics of short duration and a decent number of them, they fall into the category of High-Throughput Computing (HTC). Typical applications such as parameter/configuration scanning, divide and conquer approach can all be categorized like this. Resolving an HTC problem isn't easy on a traditional HPC cluster with time and resource limits. However, within the room that one can maneuver, there are still some options available.
Here we demonstrate one approach by using the "ht_helper.sh" (HT Helper) script that RIT provides. The idea of the "ht_helper.sh" script is to fire an FIFO mini scheduler within a real scheduler allocation (SLURM or PBS), then cycle through all the tasks within the real scheduler allocation by using the mini scheduler. These tasks could be either serial or parallel.
The following are the usage instructions for this "ht_helper.sh" script:
[joe@ln000 ]$ ht_helper.sh -h Usage: /global/home/groups/allhands/bin/ht_helper.sh [-dhkv] [-f hostfile] [-m modules] [-n # of processors per task] [-o ompi options] [-p # of parallel tasks] [-s sleep] [-t taskfile] -d dump task stdout/stderr to files -f provide a hostfile with list of processors, one per line -h this help page -k keep intermediate files (for debugging purpose) -m provide environment modules to be loaded for tasks (comma separated) -n provide number of processors per task -o provide ompi options, e.g., "-mca btl openib,sm,self" -p provide number of parallel tasks -s interval between checks (default to 60s) -t provide a taskfile with list of tasks, one per line (required) task could be a binary executable, or a script with the proper shebang, or a script string starts with something similar to "sh -c", e.g., /bin/sh -c "echo task 0; hostname; sleep 5" /foo/bar/mytask0 NOTE: if you have multiple steps for one task, you are not able to simply do "echo task 0; hostname; sleep 5", instead you will need to use the above "sh -c" trick, or use another script as a substitute -v verbose mode
To use the helper script you will need to prepare one taskfile and one job script file. The taskfile will contain all the tasks that you need to run. If a self identifier is desired for each task, environment variable "$HT_TASK_ID" can be used in the taskfile, or any of the subsequent scripts. The taskfile takes three types of input as showed in the usage page. If you are running MPI type of tasks, please make sure not to have the mpirun command in the taskfile, instead you only need to input the actual executable and input options. If mpirun command line options are required please provide them via the "-o" option. For users running parallel tasks, please make sure to turn off CPU affinity settings, if any, to avoid conflicts and serious oversubscription of CPUs. The next important parameter is the "-n" option - how many processors/cpus you want to allocate for each task, the default value is "1" for serial tasks if not provided. If you are running short-duration tasks (less than a few minutes), you may also want to reduce the default mini scheduler check interval from 60 seconds to a smaller value with the "-s" option. If you are running within an SLURM or PBS allocation, please do not specify the hostfile with "-f" option which may conflict with the default allocation. To get familiar with using this helper script, you may want to turn on "-d" (dump output from each task to an individual file), "-k" (keep intermediate files), and "-v" (verbose mode) options so that you can better understand how it works. After you are familiar with the process, you can choose which options to use, we recommend "-d" and "-v". For the job script file it will look similar to a job script for a parallel job, except that you want to run command "ht_helper.sh" on the taskfile that was just prepared instead of anything else.
Here's an example of it in production, demonstrating running an 8-task job within a 4-CPU allocation.
hostname date /path/to/myscript.sh whoami uname -a pwd sh -c "echo task $HT_TASK_ID; hostname; sleep 5" echo $((1+1))
SLURM job script:
#!/bin/bash #SBATCH --job-name=test #SBATCH --partition=savio #SBATCH --account=ac_abc #SBATCH --qos=savio_debug #SBATCH --ntasks=4 #SBATCH --time=00:10:00 module load gcc openmpi # or module load intel openmpi ht_helper.sh -t taskfile -n1 -s1 -dvk
As well, the Savio cluster now also offers High Throughput Computing nodes, which may be suitable for some of these types of HTC tasks.
A. The Hadoop framework and an auxiliary script are provided to help users to run Hadoop jobs on the HPC clusters in Hadoop On Demand (HOD) fashion. The auxiliary script "hadoop_helper.sh" is located in /global/home/groups/allhands/bin/hadoop_helper.sh and can be used interactively or from a job script. Please note that this script only provides functions to help to build a Hadoop environment, so it should never be run directly. The proper way to use it is to source it from your current environment by running "source /global/home/groups/allhands/bin/hadoop_helper.sh" (only bash is supported right now). After that please run "hadoop-usage" to see how to run Hadoop jobs. You will need to run "hadoop-start" to initialize an HOD environment and run "hadoop-stop" to destroy the HOD environment after your Hadoop job completes.
The example below shows how to use it interactively.
[joe@ln000 ~]$ srun -p savio -A ac_abc --qos=savio_debug -N 4 -t 10:0 --pty bash [joe@ln000 ~]$ module load java hadoop [joe@n0000 ~]$ source /global/home/groups/allhands/bin/hadoop_helper.sh [joe@n0000 ~]$ hadoop-start starting jobtracker, ... [joe@n0000 bash.738294]$ hadoop jar $HADOOP_DIR/hadoop-examples-1.2.1.jar pi 4 10000 Number of Maps = 4 ... Estimated value of Pi is 3.14140000000000000000 [joe@n0000 bash.738294]$ hadoop-stop stopping jobtracker ...
The example below shows how to use it in a job script:
#!/bin/bash #SBATCH --job-name=test #SBATCH --partition=savio #SBATCH --account=ac_abc #SBATCH --qos=savio_debug #SBATCH --nodes=4 #SBATCH --time=00:10:00 module load java hadoop source /global/home/groups/allhands/bin/hadoop_helper.sh # Start Hadoop On Demand hadoop-start # Example 1 hadoop jar $HADOOP_DIR/hadoop-examples-1.2.1.jar pi 4 10000 # Example 2 mkdir in cp /foo/bar in/ hadoop jar $HADOOP_DIR/hadoop-examples-1.2.1.jar wordcount in out # Stop Hadoop On Demand hadoop-stop
A. The Spark framework and an auxiliary script are provided to help users to run Spark jobs on the HPC clusters in Spark On Demand (SOD) fashion. The auxiliary script "spark_helper.sh" is located in /global/home/groups/allhands/bin/spark_helper.sh and can be used interactively or from a job script. Please note that this script only provides functions to help to build a Spark environment, so it should never be run directly. The proper way to use it is to source it from your current environment by running "source /global/home/groups/allhands/bin/spark_helper.sh" (only bash is supported right now). After that please run "spark-usage" to see how to run Spark jobs. You will need to run "spark-start" to initialize an SOD environment and run "spark-stop" to destroy the SOD environment after your Spark job completes.
The example below shows how to use it interactively:
[joe@ln000 ~]$ srun -p savio -A ac_abc --qos=savio_debug -N 4 -t 10:0 --pty bash [joe@ln000 ~]$ module load java spark [joe@n0000 ~]$ source /global/home/groups/allhands/bin/spark_helper.sh [joe@n0000 ~]$ spark-start starting org.apache.spark.deploy.master.Master, ... [joe@n0000 bash.738307]$ spark-submit --master $SPARK_URL $SPARK_DIR/examples/src/main/python/pi.py Spark assembly has been built with Hive ... Pi is roughly 3.147280 ... [joe@n0000 bash.738307]$ pyspark $SPARK_DIR/examples/src/main/python/pi.py WARNING: Running python applications through ./bin/pyspark is deprecated as of Spark 1.0. ... Pi is roughly 3.143360 ... [joe@n0000 bash.738307]$ spark-stop ...
The example below shows how to use it in a job script:
#!/bin/bash #SBATCH --job-name=test #SBATCH --partition=savio #SBATCH --account=ac_abc #SBATCH --qos=savio_debug #SBATCH --nodes=4 #SBATCH --time=00:10:00 module load java spark source /global/home/groups/allhands/bin/spark_helper.sh # Start Spark On Demand spark-start # Example 1 spark-submit --master $SPARK_URL $SPARK_DIR/examples/src/main/python/pi.py # Example 2 spark-submit --master $SPARK_URL $SPARK_DIR/examples/src/main/python/wordcount.py /foo/bar # PySpark Example pyspark $SPARK_DIR/examples/src/main/python/pi.py # Stop Spark On Demand spark-stop
A. You likely are seeing this error because you have an Intel compiler module loaded in your environment, but you are trying to build your application with a GCC compiler. Please unload any Intel compiler module(s) from your current environment and rebuild with GCC. (See Accessing and Installing Software for instructions on unloading modules.)
A. C++11 (formerly known as C++0x) features have been partially supported by Intel's C++ compilers, beginning with version 11.x, and are fully supported in the 2015.x series. For more details please refer to Intel's C++11 features support page. Note: to support the full set of C++11 features, GCC 4.8 and above is also needed. Please follow this guidance when compiling your C++ code with C++11 features on the cluster:
- Start by loading the environment module for the default version of the Intel compilers via
module load intel. Compile the code with "icpc -std=c++11 some_file" (replacing "some_file" with the actual name of your C++11 source code file).
- If the command above finishes successfully you can stop here. Otherwise please check Intel's C++11 features support page to learn whether the C++11 features your code uses are supported by the default version of the Intel compilers. If not, please switch to the cluster's environment module that provides a higher version of the Intel compilers. To do so, enter "module switch intel intel/xxxx.yy.zz" (replacing "xxxx.yy.zz" with that higher version number; enter "module avail" to find that number, if needed).
- If your code uses the C++11 Standard Template Library (STL), you’ll also need to load the GCC/4.8.5 software module as a driver; its header files provide support for the C++11 STL. To do so, enter "module load gcc/4.8.5" before compiling your code.
A. Unfortunately, that is not possible. All the compute nodes download the same operating system image from the master node and load the image into RAM disk, so changes to the operating system on the compute node would not be persistent. If you believe that you may need root access for software installations, or any other purpose related to your research workflow, please contact us and we'll be glad to explore various alternative approaches with you.
A. No. We NFS mount storage across all compute nodes so that data is available independent of which compute nodes are used; however, medium to large clusters can place a very high load on NFS storage servers and many, including Linux-based NFS servers, cannot handle this load and will lock up. A non-responding NFS mount can hang the entire cluster, so we can't risk allowing outside mounts.
A. For those with Faculty Computing Allowance accounts, usage of computational time on Savio is tracked (in effect, "charged" for, although no costs are incurred) via abstract measurement units called "Service Units." (Please see Service Units on Savio for a description of how this usage is calculated.) When all of the Service Units provided under an Allowance have been exhausted, no more jobs can be run under that account. Usage tracking does not impact Condo users, who have no Service Unit-based limits on the use of their associated compute pools.
A. You can use the following sentence in order to acknowledge computational and storage services associated with the Savio Cluster:
"This research used the Savio computational cluster resource provided by the Berkeley Research Computing program at the University of California, Berkeley (supported by the UC Berkeley Chancellor, Vice Chancellor for Research, and Chief Information Officer)."
Acknowledgements of this type are an important factor in helping to justify ongoing funding for, and expansion of, the cluster. As well, we encourage you to tell us how BRC impacts your research (Google Form), at any time!
Condo Cluster Computing Program FAQs
A. A major incentive for researchers to participate is that they only have to purchase their compute nodes, and support of the compute nodes is provided for free in exchange for their unused compute cycles. In addition to receiving professional systems administration support, researchers will be able to leverage the use of the HPC infrastructure (firewalled subnet, login nodes, commercial compiler, parallel filesystem, etc.) when they use their compute nodes. This infrastructure is provided for free and saves researchers from having to purchase and create any of these components on their own.
A. The monthly cluster support, colocation and network fees are waived for researchers who buy into the Condo. Essentially, the institution waives those costs in exchange for excess compute cycles. Each user of the system receives a 10 GB storage allocation, which includes backups. Condo groups are also eligible to receive additional group storage of 200 GB. In addition, use of the large, shared parallel scratch filesystem is provided at no cost. Condo users needing more storage for persistent data can purchase additional allocations at current rates. In addition, users needing very large amounts of persistent storage can also take advantage of the Condo Storage Service.
A. Prospective condo owners are invited to contact us. Our team will work with you to understand your application and to determine if the Condo cluster would be a suitable platform. We will provide an estimate of the costs of the compute nodes and associated InfiniBand network equipment and then work with your Procurement buyer to specify the correct items to order. Participants are expected to contribute the compute nodes and InfiniBand cable.
A. We will set up a floating reservation equivalent to the number of nodes that you contribute to the Condo to provide priority access to you and your users. You can determine the run time limits for your reservation. If you are not using your reservation, then other users will be allowed to run jobs on unused nodes. If you submit a job to run when all nodes are busy, your job will be given priority over all other waiting jobs to run, but your job will have to wait until nodes become free in order to run. We do not do pre-emptive scheduling where running jobs are killed in order to give immediate access to priority jobs.
A. The basic premise of Condo participation is to facilitate the sharing of unused resources. Dedicating or reserving compute resources works counter to sharing, so this is not possible in the Condo model. As an alternative, PIs can purchase nodes and set them up as a Private Pool in the Condo environment, which will allow a researcher to tailor the access and job queues to meet their specific needs. Private Pool compute nodes will share the HPC infrastructure along with the Condo cluster; however, researchers will have to cover the support costs for BRC staff to manage their compute nodes. Rates for Private Pool compute nodes will be determined at a later date.
A. There are two ways to do this. First, Condo users can access more nodes via Savio's preemptable, low-priority quality of service option. Second, faculty can obtain a Faculty Computing Allowance, and their users can then submit jobs to the General queues to run on the compute nodes provided by the institution. (Use of these nodes is subject to the current job queue policies for general institutional access.)