5 SLURM: the job scheduler

Learning Objectives:

Query SLURM to determine resource availability

Submit work and a resource request to be managed by SLURM

Monitor the progress of work on SLURM

Evaluate the results of work submitted to SLURM

Start an interactive session on a compute node

5.1 High performance computing review

Now that we’ve covered connecting to a remote server, basic command-line interface usage, and the fundamentals of writing a script, we’re ready to explore how to actually use the Xanadu computer cluster. You should be familiar with this already from ISG5301, but Xanadu is not a single computer. It is a large number of computers networked together, most of which are far more powerful than a standard consumer-grade laptop or desktop. These individual computers are often referred to as nodes. Each of the nodes is connected to several network-attached file systems (NFS) which store user data. Because all the nodes are connected to these file systems, no matter which of the nodes you may be working on at any given moment, you will have access to your data.

There are three main categories of nodes.

Compute nodes: These are the workhorses of the cluster. They typically have many CPUs (24-96) and memory ranging from 256G to 2TB (even a good consumer laptop won’t usually have more than 8 CPUs 16G of memory).
Login nodes: These are much smaller computers, often with only 2 CPUs and 8G of memory. They may not even be physical machines, but virtual ones. They are meant to serve as portals to more powerful resources. When users connect to Xanandu, they are assigned to one of several login nodes.
The head node: A head node is typically used only by system administrators to manage the system. It is usually the place where key workload manager software is running.

Thus far, when we have connected to Xanadu we have connected to a login node. Because login nodes have few resources, which are shared among many users, you should not use them for analysis. The kind of light work we have done (navigating the file system, inspecting files) is suitable for a login node, but if you ever run the kind of command that has you thinking, “I’ll just check social media for for a moment while this runs” then you have exceeded what you should be doing on a login node. For that kind of work, you should use a compute node

5.2 SLURM

Computer clusters like Xanadu have many users. To prevent conflicts they use software that we will variously refer to as a workload manager or job scheduler to control access to compute resources. On Xanadu, we use the software SLURM. It is one of the most commonly used on HPCs, but there others such as PBS and LSF. From a user perspective, these systems have many features in common, and in fact, Xanadu is set up to interpret PBS commands if necessary, though using them is not recommended.

In this chapter we will cover how to use SLURM to ask what resources are available, request resources to do work, monitor the status of running jobs, and evaluate jobs when they have completed.

5.2.1 The general approach

When you have to do some computational work that requires cluster resources (and you will very soon), the process looks something like this:

Decide what resources are needed to do the work.
Check to see whether the resources are available (whether they exist all, or are currently busy).
Submit the work with a resource request to the job scheduler (or simply request resources in the case of interactive work), which will start the work immediately if required resources are available, or put it in a queue if they are occupied.
Periodically monitor job progress until completion or failure (or if interactive work, do the work).
Evaluate the work.
(Possibly) Go back to (1) to redo your failed analysis, or for the next step of your analysis.

We’ll cover each of these below.

5.2.2 What resources are needed?

We will cover this question on a job-by-job basis when we start analyzing data as it can be somewhat complicated.

5.2.3 What resources exist/are available?

The two primary SLURM commands used here are sinfo and squeue. We can also look at the file /etc/slurm/slurm.conf to see a list of node features that we can request if necessary.

5.2.3.1 `sinfo`

sinfo prints information about available compute nodes and their status. At the moment of writing, running sinfo with no options prints the following output:

PARTITION  AVAIL  TIMELIMIT  NODES  STATE NODELIST
general*      up   infinite      1  drain xanadu-05
general*      up   infinite     23    mix xanadu-[01,03-04,08,10,25,39,50-52,57-61,64-66,69-70,72-74]
general*      up   infinite      4  alloc xanadu-[02,62-63,67]
general*      up   infinite      5   idle xanadu-[46-47,49,53-54]
vcell         up   infinite      4  drain xanadu-[78-81]
vcell         up   infinite      3   idle xanadu-[76-77,82]
vcellpu       up   infinite      1   idle xanadu-32
himem         up   infinite      2    mix xanadu-[40,44]
himem         up   infinite      2   idle xanadu-[06,43]
himem2        up   infinite      2    mix xanadu-[07,75]
xeon          up   infinite      1  drain xanadu-05
xeon          up   infinite     18    mix xanadu-[03-04,08,39,50-52,57-61,64-66,69-70,72]
xeon          up   infinite      4  alloc xanadu-[02,62-63,67]
xeon          up   infinite      5   idle xanadu-[46-47,49,53-54]
amd           up   infinite      2    mix xanadu-[10,25]
mcbstudent    up   infinite      2    mix xanadu-[68,71]
gpu           up   infinite      1  drain xanadu-05
gpu           up   infinite      5    mix xanadu-[01,03-04,07-08]
gpu           up   infinite      1  alloc xanadu-02
gpu           up   infinite      3   idle xanadu-[06,84-85]
crbm          up   infinite      2   idle xanadu-[55-56]

Nodes on clusters are sometimes, though not always divided into partitions, which are groups of hardware. The groups may correspond to nodes with similar characteristics, or they may belong to different user groups. They need not be mutually exclusive.

You can see above there are 10 different partitions on Xanadu. The notation xanadu-[01,03-04,07-08] refers to five nodes. xanadu-01, for example, is in both the gpu and general partitions. When partitions are present, you must decide which partition you want to submit to. Partitions need not be mutually exclusive.

Key partitions we will use in this course on Xanadu are general and himem. In the basic sinfo output, you see that some partitions are split across multiple entries according to the STATE field: alloc = allocated, i.e. all resources on these nodes have been assigned; mix = some resources on these nodes have been assigned and some remain available; idle = these nodes are currently unused; drain = these nodes are not accepting new jobs (probably they require maintenance and will be rebooted when currently assigned jobs finish).

You will frequently want to see more detail than this. The following command will print information for each node individually, formatted according to the obscure syntax found in the sinfo man page.

sinfo --format="%10P %6t %15O %15C %15F %10m %10e %15n %30E %10u"

The first few lines of output:

PARTITION  STATE  CPU_LOAD        CPUS(A/I/O/T)   NODES(A/I/O/T)  MEMORY     FREE_MEM   HOSTNAMES       REASON                         USER      
general*   drain  0.01            0/0/36/36       0/0/1/1         225612     16760      xanadu-05       Kill task failed               root      
general*   mix    18.37           28/8/0/36       1/0/0/1         257669     9569       xanadu-01       none                           Unknown   
general*   mix    11.49           35/1/0/36       1/0/0/1         257845     45859      xanadu-03       none                           root      
general*   mix    8.34            34/2/0/36       1/0/0/1         257845     1307       xanadu-04       none                           Unknown   
general*   mix    2.41            9/27/0/36       1/0/0/1         257669     50949      xanadu-08       none                           Unknown   
general*   mix    10.76           15/33/0/48      1/0/0/1         386972     52520      xanadu-10       none                           Unknown   
general*   mix    12.61           44/4/0/48       1/0/0/1         257949     7961       xanadu-25       none                           Unknown   
general*   mix    6.03            8/8/0/16        1/0/0/1         128825     26234      xanadu-39       none                           Unknown   
general*   mix    0.06            8/32/0/40       1/0/0/1         257914     119938     xanadu-50       none                           Unknown   
general*   mix    0.08            8/32/0/40       1/0/0/1         192032     13437      xanadu-51       none                           Unknown

Here you get lots of useful information. The column CPUS(A/I/O/T) tells you how many cpus are allocated/idle/other/total on each node. You can also see the amount of memory and free memory (in megabytes) on each node. This can give you a pretty fine-grained sense of what resources currently exist in terms of CPU and memory, and how many resources are idle on the cluster.

5.2.3.2 `scontrol show nodes`

To get pretty much ALL the information for each node, try scontrol show nodes, or for a specific node, e.g. xanadu-01, try

scontrol show node xanadu-01

NodeName=xanadu-01 Arch=x86_64 CoresPerSocket=18 
   CPUAlloc=16 CPUTot=36 CPULoad=7.23
   AvailableFeatures=cpu_xeon,xeon_E52697,AES,AVX,AVX2,F16C,FMA3,MMX,SSE,SSE2,SSE3,SSE4,SSSE3,gpu_A10,gpu_cc_8.6,simulations
   ActiveFeatures=cpu_xeon,xeon_E52697,AES,AVX,AVX2,F16C,FMA3,MMX,SSE,SSE2,SSE3,SSE4,SSSE3,gpu_A10,gpu_cc_8.6,simulations
   Gres=(null)
   NodeAddr=xanadu-01 NodeHostName=xanadu-01 Version=18.08
   OS=Linux 3.10.0-1160.45.1.el7.x86_64 #1 SMP Wed Oct 13 17:20:51 UTC 2021 
   RealMemory=338508 AllocMem=102400 FreeMem=40690 Sockets=2 Boards=1
   State=MIXED ThreadsPerCore=1 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A
   Partitions=general,xeon,gpu 
   BootTime=2025-07-01T17:13:07 SlurmdStartTime=2025-07-01T17:23:26
   CfgTRES=cpu=36,mem=338508M,billing=36
   AllocTRES=cpu=16,mem=100G
   CapWatts=n/a
   CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
   ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s

The command scontrol show <entity> <ID> allows you to look at a variety of features of a SLURM cluster.

5.2.3.3 `/etc/slurm/slurm.conf`

SLURM has a configuration file /etc/slurm/slurm.conf that lists features of each node. Some of this is listed by sinfo, but occasionally you may run into software that has very specific needs. Something may require a particular instruction set be present on the CPUs, or you may require a particular type of GPU be present on the node. These features are listed at the end of this file, and these features can be requested using feature constraints. Try cat /etc/slurm/slurm.conf. Look near the end of the file at the section beginning # COMPUTE NODES:

# COMPUTE NODES
NodeName=xanadu-01 CPUs=36 SocketsPerBoard=2 CoresPerSocket=18 ThreadsPerCore=1 RealMemory=257669 Features=cpu_xeon,xeon_E52697,AES,AVX,AVX2,F16C,FMA3,MMX,SSE,SSE2,SSE3,SSE4,SSSE3,gpu_A10,gpu_cc_8.6,simulations
NodeName=xanadu-02 CPUs=36 SocketsPerBoard=2 CoresPerSocket=18 ThreadsPerCore=1 RealMemory=257669 Features=cpu_xeon,xeon_E52697,AES,AVX,AVX2,F16C,FMA3,MMX,SSE,SSE2,SSE3,SSE4,SSSE3,gpu_A10,gpu_cc_8.6,simulations
NodeName=xanadu-03 CPUs=36 SocketsPerBoard=2 CoresPerSocket=18 ThreadsPerCore=1 RealMemory=257845 Features=cpu_xeon,xeon_E52697,AES,AVX,AVX2,F16C,FMA3,MMX,SSE,SSE2,SSE3,SSE4,SSSE3,gpu_M10,gpu_cc_5.2,gpu_A10,gpu_cc_8.6,simulations
NodeName=xanadu-04 CPUs=36 SocketsPerBoard=2 CoresPerSocket=18 ThreadsPerCore=1 RealMemory=257845 Features=cpu_xeon,xeon_E52697,AES,AVX,AVX2,F16C,FMA3,MMX,SSE,SSE2,SSE3,SSE4,SSSE3,gpu_A10,gpu_cc_8.6,simulations

Here you can see that node xanadu-01 has 36 CPUs, ~256G of memory, a Xeon cpu and an NVIDIA A10 GPU.

5.2.3.4 `squeue`

We have seen how you can see the type and availability of compute resources. The squeue command lets users see what jobs have been submitted to the queue, what their status is, and why. This includes pending jobs that are waiting for resources to become available and currently running jobs. Depending on SLURM configuration, this command may show only jobs you have submitted, or it may show every job. At the time of writing, Xanadu was configured to show every job.

Some example output from squeue:

  JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
7949897   general nf-MAIN_ ebrannan PD       0:00      1 (QOSMaxMemoryPerUser)
7949952   general nf-MAIN_ vvuruput PD       0:00      1 (Resources)
7949954   general nf-MAIN_ vvuruput PD       0:00      1 (Priority)
7953478   general slow59.s mhossein PD       0:00      1 (Priority)
7953479   general slow60.s mhossein PD       0:00      1 (Priority)
7946600   general    busco shillima  R 2-20:56:08      1 xanadu-52
7945409   general     bash    shird  R 3-01:21:06      1 xanadu-74
7917445     himem R_divers pmartine  R 6-01:31:02      1 xanadu-40
7953261 mcbstuden     bash meds5420  R    3:46:07      1 xanadu-68

JOBID is a unique numerical identifier assigned by SLURM to every job it runs. This is important information we’ll discuss later.
NAME is the name the user gave SLURM for the job (truncated in this view).
USER is user who submitted the job.
ST is the status: PD = pending; R = running. NODELIST(REASON) is either the list of compute nodes the job was assigned to (e.g. xanadu-52 for job 7946600) or the reason the job is not yet running: “Priority” essentially means the job will start at any moment; “Resources” means the requested resources are busy; “QOSMaxMemoryPerUser” means the running the job would exceed the the maximum memory allotment. Xanadu has resource limits so that no one person can dominate the system.

You can use the flag -u <user> to restrict to a given userid. You can try squeue -o "%.12i %.9P %.30j %.8u %.2t %.10M %.6D %R" for a little more spacious formatting.

5.2.3.5 `scontrol show job`

As we saw above with scontrol show nodes we can also get lots of details about specific jobs, in this example the JOBID 9344808:

scontrol show job 9344808

JobId=9344808 JobName=featurecounts
   UserId=svoggu(801460) GroupId=govonilab(10210) MCS_label=N/A
   Priority=8858 Nice=0 Account=pi-govoni QOS=general
   JobState=RUNNING Reason=None Dependency=(null)
   Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
   RunTime=16-23:07:52 TimeLimit=21-00:00:00 TimeMin=N/A
   SubmitTime=2025-08-26T12:29:46 EligibleTime=2025-08-26T12:29:46
   AccrueTime=2025-08-26T12:29:46
   StartTime=2025-08-26T12:29:47 EndTime=2025-09-16T12:29:47 Deadline=N/A
   PreemptTime=None SuspendTime=None SecsPreSuspend=0
   LastSchedEval=2025-08-26T12:29:47
   Partition=xeon AllocNode:Sid=xanadu-54:3194841
   ReqNodeList=(null) ExcNodeList=(null)
   NodeList=xanadu-70
   BatchHost=xanadu-70
   NumNodes=1 NumCPUs=2 NumTasks=1 CPUs/Task=2 ReqB:S:C:T=0:0:*:*
   TRES=cpu=2,mem=20G,node=1,billing=2
   Socks/Node=* NtasksPerN:B:S:C=0:0:*:1 CoreSpec=*
   MinCPUsNode=2 MinMemoryNode=20G MinTmpDiskNode=0
   Features=(null) DelayBoot=00:00:00
   OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
   Command=/home/FCAM/svoggu/RNA_Seq/gz_files/alignment6.sh
   WorkDir=/home/FCAM/svoggu/RNA_Seq/gz_files
   StdErr=/home/FCAM/svoggu/RNA_Seq/gz_files/hello_9344808.err
   StdIn=/dev/null
   StdOut=/home/FCAM/svoggu/RNA_Seq/gz_files/hello_9344808.out
   Power=

We won’t dig into this now, but once you’re a little further along, it is helpful to remember this exists.

5.2.4 Request resources (and submit work)

So far we have shown you ways to ask questions about the cluster and its resources. Now we’ll cover how to request them to get actual work done. There are two common use cases for analyzing data using cluster resources: running batch scripts that require no active user intervention, and doing interactive analysis.

5.2.4.1 Running batch jobs with `sbatch`

When you run a batch job, you write a script that will run all the steps of a given analysis for you, hand that script to SLURM, and wait for it to complete (or fail). Using one command (or one script) you tell SLURM which resources you need, and what you want it to do. The command we use for this is sbatch.

Running sbatch is in essence, just like running a script as we did in the previous chapter, except that instead of the script being run by the current shell, you hand it to SLURM with your resource request, SLURM puts it in the job queue, and when a node (or nodes) with enough resources becomes available, assigns the job there and runs it for you.

Let’s look at the most basic usage using our fully defined script from the previous chapter:

#!/bin/bash

# this script will print the 10 most common words found in a text file and their frequencies

# this line specifies the text file
ORIGIN=darwin1859.txt

# this line extracts and prints the word list
grep -o -P "\b[A-Za-z]{4,}\b" $ORIGIN | sort | uniq -c | sort -g | tail -n 10

Save it to commonWords.sh. Note that because we are passing this file to SLURM, not directly to bash, we definitely need the shebang at the top.

On Xanadu, you must minimally specify the partition you want SLURM to run the job on, and a quality of service (or QOS). Not all SLURM clusters require a QOS to be specified, but Xanadu does.

Quality of Service (QOS)

In SLURM, “Quality of Service” categories allow different jobs to be subject to different policies. Those policies may include access to hardware, resource limitations, or changes to job prioritization in the queue. On Xanadu, the two QOS’es you are likely to use are general and himem. To get access to high-memory nodes (1-2TB) you must request the QOS himem. Jobs with that QOS can use up to 2TB of memory, but users can only have jobs on two nodes at a time. Jobs submitted with the general QOS cannot access the high memory nodes, but users can run up to 100 jobs simultaneously, using up to a total of 400 CPUs and 1TB of memory.

sacctmgr show qos will list each QOS and its policies. sacctmgr show assoc user=$USER will show QOS access for $USER.

To submit the script to be run on the general partition you can do:

sbatch -p general --qos=general commonWords.sh

Try that now.

Our script wrote to stdout when we ran it interactively. Where did the results go this time? By default, they were written to a file named slurm-JOBID.out which you can find in the directory where you launched the script. In my case at the time of writing slurm-7953521.out. You can inspect this file and the results should be there.

This basic sbatch command will request a minimal amount of memory (256mb) and number of CPUs (1) which will be too little for most real data analysis. We can request more with some options:

sbatch -p general --qos=general -c 12 --mem=10G commonWords.sh

Now we’re asking for 12 CPUs with -c and 10 gigabytes of memory with --mem=10G. Far more than is needed for this tiny job. What happens if you request more memory than is available on any node in general partition?

sbatch -p general --qos=general -c 12 --mem=1000G commonWords.sh

An error which is pretty straightforward to interpret:

sbatch: error: Memory specification can not be satisfied
sbatch: error: Batch job submission failed: Requested node configuration is not available

We saw that our script’s stdout was written to the log file. Where would output from stderr go? Can we direct that to a file? Can we name the jobs so we can identify them in the queue? Or rename these slurm-JOBID files so that they are a little easier to sort through when we run are running lots of jobs?

The answer is yes. But you can imagine if we do this using command-line flags our command-line is going to start getting very long and cumbersome. To solve this, we typically specify SLURM options in a header at the top of our script.

Let’s put one in our script:

#!/bin/bash
#SBATCH --job-name=commonWords
#SBATCH -n 1
#SBATCH -N 1
#SBATCH -c 10
#SBATCH --mem=20G
#SBATCH --partition=general
#SBATCH --qos=general
#SBATCH --mail-user=MY_EMAIL@uconn.edu
#SBATCH --mail-type=ALL
#SBATCH -o %x_%j.out
#SBATCH -e %x_%j.err

hostname
date

# this script will print the 10 most common words found in a text file and their frequencies

# this line specifies the text file
ORIGIN=darwin1859.txt

# this line extracts and prints the word list
grep -o -P "\b[A-Za-z]{4,}\b" $ORIGIN | sort | uniq -c | sort -g | tail -n 10

Any command-line option we can pass to sbatch can also be placed in a header that immediately follows the shebang. Each header line begins with #SBATCH and a command-line flag follows. If we need to, we can override any of these header lines with new values on the command-line, without altering the script.

Here we name the job commonWords, request 10 cpus and 20G of memory. The mail options are optional, but if you provide an e-mail, then SLURM will let you know when your job starts and ends. The last two options specify the file name format for any output written to stdout or stderr. The format is jobname_jobid{.out,.err}

We also specify a few other options that aren’t necessary and that we rarely change: -n 1, saying that we want SLURM to launch 1 task, and -N 1 saying that we want the resources for the task to be on a single node. We won’t get into applications where a single job needs multiple SLURM tasks, or jobs that require communication between nodes in this course.

You may also notice we added two extra lines to the top of the script hostname and date. These will write out which compute node the job ran on, and the date it began to the stdout. This can be helpful information if you need to seek help when troubleshooting. Sometimes individual nodes on the cluster, rather than user errors, can be the source of problems.

Update commonWords.sh with this header and run it again. Now you can simply enter:

sbatch commonWords.sh

Now, instead of seeing a single file slurm-<JOBID>.out, you should see a pair of files corresponding to our two potential output streams: commonWords_<JOBID>.out and commonWords_<JOBID>.err. The .out file should contain the output from our script, along with the results of hostname and date, and .err should be empty (unless there were any errors).

So, to sum up we request resources and submit work to be run on the cluster by SLURM. Putting SLURM options in the script header is extremely useful, as it keeps a record of how we ran our code, not just what code we ran.

5.2.4.2 Starting interactive sessions with `srun`.

It is sometimes the case that we want to do analysis on the cluster that is more intensive than what is permissible on the login nodes, but we don’t want, or are unable to write all the steps into a batch script. This might be an exploratory analysis, where we don’t know what the steps are yet, or we might be putting together a complicated set of piped commands and we want to test that the pipe is doing what we expect before we run the entire job.

For this we use an interactive session. In an interactive session, we request resources on a compute node, and SLURM drops us into a bash session on that node, rather than a login node. On the compute node we no longer have to worry about gumming things up for everyone else. SLURM will not let us exceed our CPU request, and if we run a program that attempts to exceed the memory request, the session will simply be canceled.

We use the srun command for this. The syntax for requesting an interactive session is:

srun -p general --qos=general -c 2 --mem=10G --pty bash

Try it. To ensure that you have successfully started an interactive session, type hostname at the prompt. You should see the name of a compute node that can be found in sinfo, something like xanadu-01, rather than a login host name like hpc-ext-1.

You can exit back to the login node with exit

5.2.5 Monitor job progress

There are several strategies for monitoring job progress. We will talk about them here, and practice them later when we start doing longer-running jobs.

Check squeue to see if your job is pending, currently running, or has completed (or failed) and no longer in the queue.
Monitor output files. Check the stdout and stderr log files captured by SLURM. Many programs write progress messages or errors to these files. Check the output files themselves as well.
Log in to the compute node and run top to see how/if your process is running.

5.2.5.1 `squeue` again

When the cluster is busy and jobs are large enough that they wait in the queue for a while before resources are available, you can use squeue as we did above to see what their status is. If they are running, you’ll also be able to see how long they’ve been running. Jobs that complete too quickly or too slowly are sometimes a sign that something isn’t right (or that your expectations aren’t right). Remember that scontrol show job <jobid> will also give you a bunch of details, including the files stdout and stderr are written to.

5.2.5.2 Monitor output files

You can monitor the .out and .err files, and any output or log files produced by the stuff you’re running. In all of these cases, the first thing you can do is simply use ls -l in the relevant directories and look at the time stamps and file sizes. Are any of the expected files being updated? Many applications in genomics have steps where more or less continuous output is expected, so time stamps should be constantly updated, and file sizes should grow.

Depending on the program or the nature of your script, the .err, or .out files are going to be key files to check. These are updated in real time. If there are errors, warnings, or progress messages, they are likely to be in one of these two files. Some programs will write helpful, succinct messages, and some will write an endless alphabet soup that makes it hard to distinguish normal progress from problems. It will all depend on the program.

When it comes to program- or script-specific files, you’ll have to understand a little about the output you’re expecting. Some programs write no output until they’re complete, others write out results nearly as fast as they read data in. In the latter case, be sure to check on these files at least once and ensure that they’re growing as the program is running.

5.2.5.3 Log in to your compute node

There are lots of cases where you might want to get a more direct look at how your job is going. SLURM doesn’t have good tools for this. The solution here is to actually log directly in to the compute node and have a look. First, check which node your job is running on using squeue. Then you have two options to get into the node.

First, you can use srun and request the node with, e.g. -w xanadu-01:

srun -w xanadu-01 -p general --qos=general -c 1 --mem=500M --pty bash

Here we request minimal resources so that we are more likely to quickly get the session started.

Second, if the node is fully subscribed and you can’t get an interactive session, you can still get in from a login node with ssh:

ssh xanadu-01

If you go this route, you should not do anything else other than check on your job. If you couldn’t get an interactive session on your node, then all CPU or all memory has been requested, and you are essentially oversubscribing the node by logging in this way. You should check on your job quickly and type exit to log out.

With either method, once you’re on the compute node, you can use the program top to see running processes. You will see ALL running processes on the node in a constantly updating list, including user and system processes. To see your own processes type u and then your username followed by enter. To sort processes by memory usage, type shift-m and by processor usage shift-p. %CPU refers to the percentage of a single CPU, so if you requested 10 cpus, you may see up to %1000 CPU usage. %MEM refers to the percentage of the total memory on the node. The S column indicates whether your process is running (R) or sleeping (S).

In an ideal world, whatever you’re running would be using all the CPU and all the memory you requested. In reality this is rarely the case. - Resource usage often fluctuates. Some stages of an analysis may require the full allocation, while others do not. - You may have requested too many resources. This is often a matter of experience. Especially with memory, it can be hard to know exactly how much software will need, especially when you first start using a given package. - You may have forgotten to tell your software how many CPUs to use. For almost all analysis software capable of using more than one CPU, you will need to tell the software how many to use. If you request 10 cpus with -c 10, then you probably also have to use a command line option telling the software to use those 10 CPUs, other wise it will likely just use 1.

In top, if you see that your processes are often sleeping, that can be a sign they aren’t running efficiently and that something may be wrong.

Type q to quit top.

5.2.6 Evaluate the work

When the job is no longer in the queue, it has completed, failed, or been canceled. You first need to figure out which this is! The two main approaches are to use the slurm commands seff and/or sacct, and to look at the same output files you looked at when monitoring the job’s progress.

5.2.6.1 `seff` and `sacct`

seff is the simplest approach. Simply type seff <JOBID> at the command line. You will get a report like this from our commonWords.sh job:

Job ID: 7953646
Cluster: xanadu
User/Group: nreid/cbc
State: COMPLETED (exit code 0)
Nodes: 1
Cores per node: 10
CPU Utilized: 00:00:01
CPU Efficiency: 10.00% of 00:00:10 core-walltime
Job Wall-clock time: 00:00:01
Memory Utilized: 1.56 MB
Memory Efficiency: 0.01% of 20.00 GB

Some of this is self-explanatory at this point. CPU Utilized is the sum of the CPU-hours used by the job. Wall-clock time means the actual time span over which the job ran. CPU Efficiency is CPU Utilized / (cores per node * wall-clock time), the average rate of CPU usage. Memory Efficiency is the peak memory usage divided by the memory requested. This job was very simple and used virtually none of the resources we requested. This is not what you generally want to see, but because our job only took 1 second, it’s not that big of a deal. If every job on the cluster had efficiencies ranging from 1-10%, it would be a pretty big waste. If your jobs hit 80% efficiency, you’re doing pretty good.

sacct is pretty large and complicated SLURM function that can extract all kinds of job information. We won’t get into it too much here except to say this command can provide some detail on a job:

sacct -o jobid%-11,jobname%30,nodelist%15,user%12,group%15,partition,state,ReqMem,MaxRSS,ReqCPUS,elapsed,Timelimit,submit -j <JOBID>

For this job we get this output:

      JobID                        JobName        NodeList         User           Group  Partition      State     ReqMem     MaxRSS  ReqCPUS    Elapsed  Timelimit              Submit 
----------- ------------------------------ --------------- ------------ --------------- ---------- ---------- ---------- ---------- -------- ---------- ---------- ------------------- 
7953646                        commonWords       xanadu-25        nreid             cbc    general  COMPLETED       20Gn                  10   00:00:01 21-00:00:+ 2024-04-28T15:57:09

5.2.6.2 Checking output files

The approach is basically the same as above. Check your log files for errors, warnings and progress messages. Check the program outputs to see that they are what you expect. That last bit requires some understanding of what you’re trying to do.

5.2.7 What resources should I request?

This is a perennially challenging problem. In much of bioinformatics, software developers do not, or perhaps cannot, give general advice about what resources their program will need. There are so many axes of variation for different datasets: experimental design, species, tissue, sample size and more. For a given algorithm, these may drastically impact resource needs, or not impact them at all.

When analyzing a new dataset, doing a new type of analysis, or using a new piece of software, we generally advise people to expect to have to experiment a bit. We will talk about resource requests for different pieces of software as we move through the course.

That said, we have some general guidelines:

Check the software documentation. It’s quite possible there is a critical variable that defines how much memory or CPU the program requires, or can utilize, and that the developer has actually explained it for you quite clearly.
When you run the job for the first time, check the CPU and memory efficiency. See if the job failed because it ran out of memory usually oom-kill is somewhere in the .err file. Request more memory if so. Request fewer resources if efficiency was low.
Remember that when requesting CPUs, you almost always have to tell the actual program you are running how many CPUs are available to it. If you tell SLURM you want 10 CPUs, you usually have to provide an option (sometimes -p or -t or -c) telling the program how many it can use.
Try to tune your resource requests to your actual analyses. If you just copy-paste the same SLURM header requesting 24 CPUs and 100G of memory for every job, sometimes it won’t be enough, and most of the time it will be way too much. Either way wastes resources for everyone, and your time. Bigger jobs often sit longer in the queue. Rerunning failed jobs is a huge pain.

The UConn Computational Biology Core has a document you can reference about resource requests (much of which is covered here) here.

5.3 Exercises

See Blackboard Ultra for this section’s exercises.