What if my job doesn't need all the resources on a node?

How to use SeaWulf's shared queues

 

This KB Article References: High Performance Computing
This Information is Intended for: Instructors, Researchers, Staff, Students
Created: 10/26/2023 Last Updated: 06/19/2024
 

The shared queues

Historically, all jobs run on SeaWulf have had exclusive access to the nodes(s) they were allocated. However, some jobs may not require all the CPU and memory resources present on the compute nodes.  For example, a job using only 1 CPU core and 10 GB of memory will leave most of the computational resources on an AMD Milan node unused and essentially wasted for the duration of the job.  In such cases, we recommend submitting those jobs to one of our shared queues:
 
AMD Milan 

short-96core-shared
long-96core-shared
extended-96core-shared


Intel Skylake

short-40core-shared
long-40core-shared
extended-40core-shared


These queues allow multiple jobs from multiple users to be run at the same time on the same node, as long as the total requested CPUs and memory of all the jobs do not exceed the total available on the node.

The benefits

There are two main benefits to running less computationally demanding jobs in these shared queues:

  • Computational resources will be used more efficiently.
  • Because of this, users may spend less time waiting in the queues for their jobs to run.

Adoption of the shared queues for less resource intensive jobs is expected to increase the total throughput of jobs on the cluster, thus benefiting the entire SeaWulf community.

Requesting resources

When submitting to one of the shared queues, it is critical that users specify the CPU and memory resources required for their jobs using SBATCH flags. For example, the following Slurm script will request 1 CPU and 10 GB of memory:

#!/usr/bin/env bash

#SBATCH --job-name=test_job
#SBATCH --output=job.txt
#SBATCH --time=00:05:00
#SBATCH -p short-96core-shared
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --mem=10g


echo "Available CPUs:  $SLURM_JOB_CPUS_PER_NODE"


The combination of "--ntasks" and "--cpus-per-task" controls the number of allocated CPU cores.  Likewise, the "--mem" flag allocates the specified amount of memory.

If no CPU and memory flags are included in the job script, jobs submitted to one of the shared queues will default to 1 CPU and ~2 GB of memory.  Since this may be insufficient for a lot jobs, users are strongly encouraged to explicitly request the resources they expect to need for their shared jobs using the above SBATCH flags.
 

Article Topic