How to use SeaWulf's shared queues
The shared queues
Historically, all jobs run on SeaWulf have had exclusive access to the nodes(s) they were allocated. However, some jobs may not require all the CPU and memory resources present on the compute nodes. For example, a job using only 1 CPU core and 10 GB of memory will leave most of the computational resources on an AMD Milan node unused and essentially wasted for the duration of the job. In such cases, we recommend submitting those jobs to one of our shared queues:
AMD Milan
short-96core-shared long-96core-shared extended-96core-shared
Intel Skylake
short-40core-shared
long-40core-shared
extended-40core-shared
These queues allow multiple jobs from multiple users to be run at the same time on the same node, as long as the total requested CPUs and memory of all the jobs do not exceed the total available on the node.
The benefits
There are two main benefits to running less computationally demanding jobs in these shared queues:
- Computational resources will be used more efficiently.
- Because of this, users may spend less time waiting in the queues for their jobs to run.
Adoption of the shared queues for less resource intensive jobs is expected to increase the total throughput of jobs on the cluster, thus benefiting the entire SeaWulf community.
Requesting resources
When submitting to one of the shared queues, it is critical that users specify the CPU and memory resources required for their jobs using SBATCH flags. For example, the following Slurm script will request 1 CPU and 10 GB of memory:
#!/usr/bin/env bash #SBATCH --job-name=test_job #SBATCH --output=job.txt #SBATCH --time=00:05:00 #SBATCH -p short-96core-shared #SBATCH --ntasks=1 #SBATCH --cpus-per-task=1 #SBATCH --mem=10g echo "Available CPUs: $SLURM_JOB_CPUS_PER_NODE"
The combination of "--ntasks" and "--cpus-per-task" controls the number of allocated CPU cores. Likewise, the "--mem" flag allocates the specified amount of memory.
If no CPU and memory flags are included in the job script, jobs submitted to one of the shared queues will default to 1 CPU and ~2 GB of memory. Since this may be insufficient for a lot jobs, users are strongly encouraged to explicitly request the resources they expect to need for their shared jobs using the above SBATCH flags.