What are the SeaWulf queues? | Research Computing

SeaWulf's queues, also referred to as partitions, are designed to optimize both runtime and resource allocation across its computing nodes, accommodating a wide range of computational needs and hardware configurations. It's essential for users to align their job submissions with the capabilities of each partition, considering factors such as core counts, GPU specifications, and memory capacities, to ensure maximum cluster efficiency and effective utilization of allocated resources.

This KB Article References: High Performance Computing

This Information is Intended for: Guests, Instructors, Researchers, Staff, Students

Created: 10/21/2016 Last Updated: 09/09/2024

Expect a Trade-off Between Resource Usage and Wait Times

Balancing resource requests (nodes and job duration) with queue wait times minimizes waste and delays. Users should carefully assess computational requirements to avoid underutilizing or overloading the system.

Optimize Resource Usage with Test Jobs

Not all applications benefit from using multiple nodes or all cores on a single node. Starting with smaller test jobs allows users to gauge the necessary computational resources accurately before scaling up to larger jobs.

Understand Hardware Specifics for Optimal Performance

SeaWulf offers queues with diverse CPU architectures (e.g., Haswell, Skylake, AMD EPYC Milan) and varying core counts. Understanding these specifics is crucial for matching the computational requirements of applications effectively. Users should select a queue that aligns with their application's CPU architecture requirements to achieve optimal performance.

GPU Usage: Maximize Efficiency

Queues equipped with GPUs should be used exclusively for applications that require GPU acceleration. Before submitting jobs, users must verify compatibility with the available GPU types and ensure that their software is configured to utilize GPUs effectively.

Consider Shared Queues for Smaller Jobs

SeaWulf offers "shared" queues such as 96-core or 40-core configurations where multiple jobs can run concurrently on the same node. These queues are ideal for jobs that require modest computational resources and maximize resource efficiency by utilizing node capacities fully.

Use Specialized Queues for Testing and Debugging

For tasks like interactive sessions, brief testing, or code debugging, SeaWulf provides specialized queues such as debug-28core or short queues with rapid turnaround times. These queues are designed to prioritize quick job execution, making them suitable for initial code development and testing phases where rapid feedback is essential.

Use Long Queues When Uncertain of Job Duration

If unsure about the runtime required for a job, opting for long queues initially allows flexibility. Users can assess actual job durations from test runs or previous executions and then adjust to more suitable queues accordingly.

Handle Memory-Intensive Jobs Appropriately

For applications demanding significant memory resources, SeaWulf provides specialized queues like hbm-1tb-long-96core, equipped with nodes featuring large memory capacities tailored for memory-intensive tasks.

Ensure Software/Hardware Compatibility

Ensuring software compatibility with the hardware configurations available in each queue is essential for maximizing job performance. Users should verify that their applications are configured to leverage the specific CPU architectures, core counts, and GPU types available in their chosen queue.

By following these guidelines, users effectively manage job submissions on SeaWulf, optimizing resource usage and minimizing queue wait times.

Available Queues

The full list of available queues will depend upon the type of login node you are submitting from. Specifically, there are two sets of login nodes: login1/login2 which provide access to one set of queues, and milan1/milan2 which provide access to another set of queues.

Queues accessed from login1 and login2:

Queue	CPU Architecture	Vector/Matrix Extension	CPU Cores per Node	GPUs per Node	Node Memory¹	Default Runtime	Max Runtime	Max Nodes	Min Nodes	Max Simultaneous Jobs per User
debug-28core	Intel Haswell	AVX2	28	0	128 GB	1 hour	1 hour	8	n/a	n/a
short-28core	Intel Haswell	AVX2	28	0	128 GB	1 hour	4 hours	12	n/a	8
medium-28core	Intel Haswell	AVX2	28	0	128 GB	4 hours	12 hours	24	8	2
long-28core	Intel Haswell	AVX2	28	0	128 GB	8 hours	48 hours	8	n/a	6
extended-28core	Intel Haswell	AVX2	28	0	128 GB	8 hours	7 days	2	n/a	6
large-28core	Intel Haswell	AVX2	28	0	128 GB	4 hours	8 hours	80	24	1
gpu	Intel Haswell	AVX2	28	4	128 GB	1 hour	8 hours	2	n/a	2
gpu-long	Intel Haswell	AVX2	28	4	128 GB	8 hours	48 hours	1	n/a	2
gpu-large	Intel Haswell	AVX2	28	4	128 GB	1 hour	8 hours	4	n/a	1
p100	Intel Haswell	AVX2	12	2	64 GB	1 hour	24 hours	1	n/a	1
v100	Intel Haswell	AVX2	28	2	128 GB	1 hour	24 hours	1	n/a	1

¹A small subset of node memory is reserved for the OS and file system and is not available for user applications.

Queues accessed from milan1 and milan2:

Queue	CPU Architecture	Vector/Matrix Extension	CPU Cores per Node	GPUs per Node	Node Memory¹	Default Runtime	Max Runtime	Max Nodes	Min Nodes	Max Simultaneous Jobs per User	Multiple Users per Node
debug-40core	Intel Skylake	AVX512	40	0	192 GB	1 hour	1 hour	8	n/a	n/a	No
short-40core	Intel Skylake	AVX512	40	0	192 GB	1 hour	4 hours	8	n/a	4	No
short-40core-shared	Intel Skylake	AVX512	40	0	192 GB	1 hour	4 hours	4	n/a	n/a	Yes
medium-40core	Intel Skylake	AVX512	40	0	192 GB	4 hours	12 hours	16	6	1	No
long-40core	Intel Skylake	AVX512	40	0	192 GB	8 hours	48 hours	6	n/a	3	No
long-40core-shared	Intel Skylake	AVX512	40	0	192 GB	8 hours	24 hours	3	n/a	n/a	Yes
extended-40core	Intel Skylake	AVX512	40	0	192 GB	8 hours	7 days	2	n/a	3	No
extended-40core-shared	Intel Skylake	AVX512	40	0	192 GB	8 hours	3.5 days	1	n/a	n/a	Yes
large-40core	Intel Skylake	AVX512	40	0	192 GB	4 hours	8 hours	50	16	1	No
short-96core	AMD EPYC Milan	AVX2	96	0	256 GB	1 hour	4 hours	8	n/a	4	No
short-96core-shared	AMD EPYC Milan	AVX2	96	0	256 GB	1 hour	4 hours	4	n/a	n/a	Yes
medium-96core	AMD EPYC Milan	AVX2	96	0	256 GB	4 hours	12 hours	16	6	1	No
long-96core	AMD EPYC Milan	AVX2	96	0	256 GB	8 hours	48 hours	6	n/a	3	No
long-96core-shared	AMD EPYC Milan	AVX2	96	0	256 GB	8 hours	24 hours	3	n/a	n/a	Yes
extended-96core	AMD EPYC Milan	AVX2	96	0	256 GB	8 hours	7 days	2	n/a	3	No
extended-96core-shared	AMD EPYC Milan	AVX2	96	0	256 GB	8 hours	3.5 days	1	n/a	n/a	Yes
large-96core	AMD EPYC Milan	AVX2	96	0	256 GB	4 hours	8 hours	38	16	1	No
hbm-short-96core	Intel Sapphire Rapids	AMX, AVX512 & Intel DL Boost	96	0	384 GB (256GB DDR5 + 128GB HBM)	1 hour	4 hours	8	n/a	4	No
hbm-medium-96core	Intel Sapphire Rapids	AMX, AVX512 & Intel DL Boost	96	0	384 GB (256GB DDR5 + 128GB HBM)	4 hours	12 hours	16	6	1	No
hbm-long-96core	Intel Sapphire Rapids	AMX, AVX512 & Intel DL Boost	96	0	384 GB (256GB DDR5 + 128GB HBM)	8 hours	48 hours	6	n/a	3	No
hbm-1tb-long-96core	Intel Sapphire Rapids	AMX, AVX512 & Intel DL Boost	96	0	1000 GB (1 TB DDR5 + 128 GB HBM configured as level 4 cache)	8 hours	48 hours	1	n/a	1	No
hbm-extended-96core	Intel Sapphire Rapids	AMX, AVX512 & Intel DL Boost	96	0	384 GB (256GB DDR5 + 128GB HBM)	8 hours	7 days	2	n/a	3	No
hbm-large-96core	Intel Sapphire Rapids	AMX, AVX512 & Intel DL Boost	96	0	384 GB (256GB DDR5 + 128GB HBM)	4 hours	8 hours	38	16	1	No
a100	Intel Ice Lake	AVX512 & Intel DL Boost	64	4	256 GB	1 hour	8 hours	2	n/a	2	Yes
a100-long	Intel Ice Lake	AVX512 & Intel DL Boost	64	4	256 GB	8 hours	48 hours	1	n/a	2	Yes
a100-large	Intel Ice Lake	AVX512 & Intel DL Boost	64	4	256 GB	1 hour	8 hours	4	n/a	1	Yes

¹A small subset of node memory is reserved for the OS and file system and is not available for user applications.

In addition to the limits in the tables above, users cannot use more than 32 nodes at one time unless running jobs in one of the large queues, and the maximum number of jobs that a user can have queued at any given time is 100.

Hardware Configurations Across SeaWulf Queues

SeaWulf's queues offer a variety of hardware configurations tailored to different computational needs. Here’s a detailed breakdown of the hardware specifications across various queues:

The debug-28core, short-28core, long-28core, extended-28core, medium-28core, and large-28core queues share a set of identical nodes that have a max of 28 Haswell cores.
The debug-40core, short-40core, long-40core, extended-40core, medium-40core, and large-40core queues share a set of identical nodes that have 40 Skylake cores.
The short-96core, long-96core, extended-96core, medium-96core, and large-96core queues share a set of identical nodes that have 96 AMD EPYC Milan cores.
The short-96core-shared, long-96core-shared, and extended-96core-shared queues also share the same set of identical nodes that have 96 AMD EPYC Milan cores, but multiple jobs are allowed to run on the same node simultaneously.
The hbm-short-96core, hbm-long-96core, hbm-extended-96core, hbm-medium-96core, and hbm-large-96core queues share a set of identical nodes that have 96 Intel Sapphire Rapids cores.
The hbm-1tb-long-96core queue allocates jobs to 4 identical nodes that have 96 Intel Sapphire Rapids cores. These nodes differ from the other hbm nodes in that they are configured in Cache mode and have 1 TB DDR5 memory.
The gpu and gpu-long queues share a third set of identical nodes that are similar to those used by the short, long, etc. queues but with 4x K80 24GB GPUs each.
The p100 and v100 queues each allocate jobs to a single node that has two Tesla P100 16GB or 2x V100 32GB GPUs, respectively.
The a100, a100-long, and a100-large queues have 4x A100 80GB GPUs and 64 cores of Intel Xeon Ice Lake CPUs.

Users must ensure that their applications are compatible with the specific hardware configurations available in each queue. This involves optimizing software usage to effectively utilize CPU architectures, GPU capabilities, and memory configurations.

Article Topic

Getting Started