Skip to main content

Slurm: Main Commands

Slurm offers many utility commands to work with, some of the most popularly used commands are:

CommandDescription
srunRun parallel jobs
sbatchSubmit a batch script to Slurm
sallocObtain a Slurm job allocation ( for interactive workflows )
sinfoView information about Slurm nodes and partitions
squeueView information about the jobs located in the slurm scheduling queue
sacctDisplays accounting data for all jobs in and job steps in the slurm job accounting log or slurm database
scancelUsed to signal jobs or job steps that are under the control of slurm

srun

Run a parallel job on cluster managed by Slurm, can be used:

  1. Individual job submission where resources are allocated.
  2. In sbatch batch scripts as job steps making use of the allocated resource pool.
  3. within salloc instance making use of the resource pool.
man srun # for more information
OptionDescription
--helpDisplay help information and exit
--accountCharge resource used by this job to a specified account
--ntasks or --nodesRequest the number of tasks for the job Or Request the number of nodes to be allocated for this job
--ntasks-per-nodeRequest that ntasks be invoked on each node. Meant to be used with --nodes
--cpus-per-taskRequest that ncpus be allocated per process. This may be useful if the job is multithreaded and requires more than one CPU per task for optimal performance.
--mem or --mem-per-cpuSpecify the real memory required per node. Default units are megabytes. Different units can be specified using the suffix [ K | M | G | T ] Or Minimum memory required per allocated CPU
--outputRedirect stdout to a file
--errorRedirect stderr to a file
--labelPrepend task numbers to lines of stdout/err
--partitionRequest a specific partition for the resource allocation. If not specified, the default behavior is to allow the slurm controller to select the default partition as designated by the system administrator.
--ptyExecute task zero with pseudo terminal mode or using pseudo terminal specified by <File Descriptor>.
--gresSpecifies a comma-delimited list of generic consumable resources, examples: --gres=gpu:1, --gres=gpu:v100:2, --gres=help or --gres=none
--chdirSet the working directory of srun before it is executed

sbatch

man sbatch # for more information

Some of the popularly used directives are:

OptionDescription
#SBATCH --accountCharge resource used by this jab to a specified account
#SBATCH --nodes or #SBATCH --ntasksRequest allocation of minimum or maximum nodes for this job
#SBATCH --ntasks-per-nodeRequest that ntasks be invoked on each node, used with --nodes
#SBATCH --cpus-per-taskAdvise the Slurm controller that ensuing job steps will require ncpus number of processors per task. Without this option, the controller will just try to allocate one processor per task
#SBATCH --memSpecify the real memory required per node. Default units are megabytes. Different units can be specified using the suffix [ K | M | G | T ]
#SBATCH --gresSpecifies a comma-delimited list of generic consumable resources.
#SBATCH --outputInstruct Slurm to connect the batch script's standard output directly to a specified filename
#SBATCH --errorInstruct Slurm to connect the batch script's standard error directly to a specified filename
#SBATCH --mail-userUser to receive email notifications of state changes as defined by --mail-type
#SBATCH --mail-typeNotify user by email when certain event types occur. Valid type values are NONE, BEGIN, END, FAIL, REQUEUE, ALL etc. Multiple type values may be specified in a comma separated list. The user to be notified is indicated with --mail-user.
#SBATCH --job-nameSpecify a name for the job allocation, the default is the name of the batch script or just sbatch
#SBATCH --constraintEnable constraints such as --constraint="nvidia" to select any kind of nvidia GPUs or --constraint="amd" to select any kind of amd GPUs or --constraint="a100|h100" to select either any one of two GPUs
#SBATCH --chdirSet the working directory of sbatch script before it is executed

salloc

The options for salloc are similar to the ones used by srun or sbatch, consult the salloc manual pages for more information on additional options and their environment variables:

man salloc # for detailed information

sinfo

View information about slurm nodes and partitions.

man sinfo # for more information
sinfo --Format=Partition,GRES,CPUs,Features:26,NodeList
FormatDescription
AvailableState/availability of a partition
CPUsNumber of CPUs per node
CPUsStateNumber of CPUs by state in the format "allocated/idle/other/total"
Features:26Features available on the node, use : followed by a number which specifies the max number of characters printed for this column. sinfo prints max 20 characters by default per column
GresGeneric resource associated with the nodes
GresUsedGeneric resource currently in use on the nodes
MaxCPUsPerNodeThe Max number of CPUs per node available to jobs in this partition
MemorySize of memory per node in Megabytes
NodeAINumber of nodes by state in the format "allocated/idle"
NodesNumber of nodes
NodeListList of node names
Partition or PartitionNamePartition name

squeue

View information about jobs located on slurm scheduling queue.

man squeue # for more information
OptionsDescription
--mePrints queued jobs for the current user
--userPrints queued jobs under a specific user, or a comma list of users
--jobSpecify a comma seperated list of job IDs to display
--helpPrint a help message describing all options squeue

sacct

Displays accounting data for all jobs and job steps in the Slurm job accounting log or Slurm database.

man sacct # for more information

Most popularly used format options are:

OptionsDescription
--formatComma separated list of fields. (use "--helpformat" for a list of available fields). NOTE: When using the format option for listing various fields you can put a %NUMBER afterwards to specify how many characters should be printed. e.g. format=name%30 will print 30 characters of field name right justified. A %-30 will print 30 characters left justified.
--helpformatPrint a list of fields that can be specified with --format option

Some popular options for --format are:

FormatDescription
JobIDThe identification number of the job or job step
JobNameThe name of the job or job step
StateDisplays the job status or state, such as COMPLETED, TIMEOUT, FAILED etc
AllocCPUSNumber of CPUs allocated to the job
ElapsedElapsed time for the job
StartInitiation time for the job
EndTermination time for the job

scancel

Used to signal jobs or job steps that are under the control of slurm. A signal in the sense, send a termination signal to cancel a job.

OptionsDescription
--interactiveInteractive mode. Confirm each job_id.step_id before performing the cancel operation
--jobnameRestrict the scancel operations to a provided job name
--meCancel all your jobs
scancel <a_job_id>Cancel a job and all it's steps
scancel <a_job_id>.<step_id_a> <a_job_id>.<step_id_b>Only cancel steps a and b for a given job, but not the rest of the steps
scancel <JobID_ArrayID>Only cancel a array id of an job array