Run and Review Your Jobs

Overview

To submit and run jobs, cancel jobs, and check the status of jobs on the Savio cluster, you'll use the Simple Linux Utility for Resource Management (SLURM), an open-source resource manager and job scheduling system. (SLURM manages jobs, job steps, nodes, partitions (groups of nodes), and other entities on the cluster.)

There are several basic SLURM commands you'll likely use often:

  • sbatch - Submit a job to the batch queue system, e.g., sbatch myjob.sh, where myjob.sh is a SLURM job script
  • srun - Submit an interactive job to the batch queue system
  • scancel - Cancel a job, e.g., scancel 123, where 123 is a job ID
  • squeue - Check the current jobs in the batch queue system, e.g., squeue -u $USER to view your own jobs
  • sq - Check why your job is not running, e.g., module sq; sq
  • sacctmgr - Check what resources (accounts, partitions, and QoS you have access to, e.g., sacctmgr -p show associations user=$USER
  • sinfo - View the status of the cluster's compute nodes, including how many nodes - of what types - are currently available for running jobs.

Please see the following for detailed information on: