January 7, 2026 by Timothy Poon5 minutes
The --overlap flag is a Slurm feature that
allows multiple job steps – i.e. jobs started by using srun – to share
resources from a single allocation. In this post we present two scenarios in
which --overlap can be useful.
It is often helpful to monitor resource usage in real time when running a job
on an HPC system – just as you might locally. While a profiler provides richer
diagnostic information, often you just want a quick glance at usage. One
example is checking CPU or memory usage using tools like ps, top, or
htop. In Slurm, you can use the --overlap flag to enable this – especially
on clusters where you can’t SSH directly into active compute nodes.
Let’s say you have the following batch job submission script my_cpu_job:
#!/usr/bin/env bash
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
srun sleep 60After submitting the job using sbatch my_cpu_job, say the job number is
1234567. When it has started, you can use the following command to attach a
shell to the compute node of the submitted job:
srun --jobid 1234567 --overlap --pty bash
After running this you will then see the prompt’s hostname has changed.
The --overlap flag ensures you are able to use the resources allocated to
you from the job specified by --jobid. You should now be inside the compute
node and you are free to wander around, for example, use top to monitor the
memory usage of different processes.
This works with multi-node jobs as well:
srun --jobid 1234567 --nodelist computenode1 --overlap --pty bash
Together with --nodelist computenode2 in another shell, you can collect live
diagnostic information in the two nodes and it may provide valuable information
when you are debugging a multi-node job. Note that all the interactive sessions
will be killed once the resource allocation of the batch job is terminated
(e.g. all the job steps in the batch job are finished), as the allocated
resources have been retracted so nothing for others to ‘overlap’.
Another useful situation is it is very helpful to monitor the memory usage
of a GPU by using nvidia-smi alongside with your GPU job. At the time of
writing this post, the --overlap flag does not allow sharing of Generic
Resources (GRES) such as GPU because of the version of Slurm ARC is using
(20.11.7). Later versions of Slurm permit the sharing of GRES by --overlap so
the above approach could be used to check GPU usage in real time. See
Resources.
In ARC, you could just SSH into the node running the GPU job and use
nvidia-smi to monitor GPU usage. The --overlap approach could still be used
to attach an interactive session to a GPU job without sharing the GPU resource
by explicitly saying you do not want GPU resource:
# for Slurm 20.11.7
srun --jobid 1234567 --gres gpu:0 --overlap --pty bash
The above command allows you to monitor other aspects of the job, and if you
use nvidia-smi it will report ‘No devices were found’ because you have no
permission to use any GPU resource in the interactive session; if you SSH into
the node, nvidia-smi will show information of all the GPUs.
You may wonder if I am inside an interactive session, in principle I can do anything with the allocated resources, so is it possible to run some actual workload instead of monitoring tools? Yes you can!
To illustrate the idea, this simple batch submission script below allows sharing of a single CPU in 1 node by 4 different job steps:
#!/usr/bin/env bash
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --mem=8G
# ...
srun --exact -N1 -n1 -c1 --mem 2G --overlap sleep 10 &
srun --exact -N1 -n1 -c1 --mem 2G --overlap echo 'Hello!' &
srun --exact -N1 -n1 -c1 --mem 2G --overlap ls &
srun --exact -N1 -n1 -c1 --mem 2G --overlap hostname &
waitBoth the & background operator at the end and --overlap are essential:
without putting the job steps in the background, the job steps will be executed
sequentially; without --overlap, the allocated resources will not be shared
with the job steps so it will pause and wait for avaiable resources. The wait
at the end is a synchronisation point to ensure all background jobs (in our
case the 4 job steps) finish before returning to the main process. You do not
need to provide --jobid as by default it is inherited from SLURM_JOB_ID in
the batch job. The --exact flag is to ensure each job step uses the exact
resources requested, i.e. run on 1 node and with 1 CPU. It is important to
ensure the total allocated memory is sufficient for all the job steps.
The above approach opens up a lot of opportunities for better resource utilisation, for instance, you can have one job step that is computationally intensive and another I/O-bound job step concurrently while using the same group of CPUs and memory.
#!/usr/bin/env bash
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=20
#SBATCH --mem=200G
# ...
srun --exact -N1 -n1 -c20 --mem 180G --overlap ./job_io_bound &
srun --exact -N1 -n1 -c20 --mem 20G --overlap ./job_cpu_intensive &
waitAnother use case could be very lightweight parameter sweep. Instead of using array job, you could do something like this:
#!/usr/bin/env bash
#SBATCH --nodes=1
#SBATCH --ntasks=48
#SBATCH --cpus-per-task=1
#SBATCH --mem 200G
# ...
# an array of all configuration files
config_all=('/path/to/config1.yaml' '/path/to/config2.yaml' ...)
for i in $(seq 0 47)
do
config="${config_all[$i]}"
srun --exact -N1 -n1 -c1 --mem 4G --overlap ./run_simulation --config $config &
done
waitThe biggest advantage of this approach over array jobs is that it avoids the overhead of scheduling, if your job is relatively quick. The scheduler only needs to perform one allocation instead of many separate allocations. Another advantage is that if it takes time to set up your environment, this approach can avoid performing the same time-consuming set up in each array job.
The --overlap flag is useful for interactive job monitoring and running
concurrent jobs for better resource utilisation. It enables diverse workflow
in running jobs in HPC system. While it is very powerful and flexible for
running concurrent jobs, it is best to use on jobs that are well defined
otherwise you may see unexpected performance degradation or weird errors.