What are the SeaWulf queues?

SeaWulf's queues, also referred to as partitions, are designed to optimize both runtime and resource allocation across its computing nodes, accommodating a wide range of computational needs and hardware configurations. It's essential for users to align their job submissions with the capabilities of each partition, considering factors such as core counts, GPU specifications, and memory capacities, to ensure maximum cluster efficiency and effective utilization of allocated resources.

 

This KB Article References: High Performance Computing

This Information is Intended for: Guests, Instructors, Researchers, Staff, Students
Created: 10/21/2016 Last Updated: 09/09/2024
 

Expect a Trade-off Between Resource Usage and Wait Times

Balancing resource requests (nodes and job duration) with queue wait times minimizes waste and delays. Users should carefully assess computational requirements to avoid underutilizing or overloading the system.


Optimize Resource Usage with Test Jobs

Not all applications benefit from using multiple nodes or all cores on a single node. Starting with smaller test jobs allows users to gauge the necessary computational resources accurately before scaling up to larger jobs.


Understand Hardware Specifics for Optimal Performance

SeaWulf offers queues with diverse CPU architectures (e.g., Haswell, Skylake, AMD EPYC Milan) and varying core counts. Understanding these specifics is crucial for matching the computational requirements of applications effectively. Users should select a queue that aligns with their application's CPU architecture requirements to achieve optimal performance.


GPU Usage: Maximize Efficiency

Queues equipped with GPUs should be used exclusively for applications that require GPU acceleration. Before submitting jobs, users must verify compatibility with the available GPU types and ensure that their software is configured to utilize GPUs effectively. 


Consider Shared Queues for Smaller Jobs

SeaWulf offers "shared" queues such as 96-core or 40-core configurations where multiple jobs can run concurrently on the same node. These queues are ideal for jobs that require modest computational resources and maximize resource efficiency by utilizing node capacities fully. 


Use Specialized Queues for Testing and Debugging

For tasks like interactive sessions, brief testing, or code debugging, SeaWulf provides specialized queues such as debug-28core or short queues with rapid turnaround times. These queues are designed to prioritize quick job execution, making them suitable for initial code development and testing phases where rapid feedback is essential.
 

Use Long Queues When Uncertain of Job Duration

If unsure about the runtime required for a job, opting for long queues initially allows flexibility. Users can assess actual job durations from test runs or previous executions and then adjust to more suitable queues accordingly. 


Handle Memory-Intensive Jobs Appropriately

For applications demanding significant memory resources, SeaWulf provides specialized queues like hbm-1tb-long-96core, equipped with nodes featuring large memory capacities tailored for memory-intensive tasks.


Ensure Software/Hardware Compatibility

Ensuring software compatibility with the hardware configurations available in each queue is essential for maximizing job performance. Users should verify that their applications are configured to leverage the specific CPU architectures, core counts, and GPU types available in their chosen queue.

By following these guidelines, users effectively manage job submissions on SeaWulf, optimizing resource usage and minimizing queue wait times.

 

Available Queues

The full list of available queues will depend upon the type of login node you are submitting from. Specifically, there are two sets of login nodes: login1/login2 which provide access to one set of queues, and milan1/milan2 which provide access to another set of queues.

 

Queues accessed from login1 and login2:

Queue CPU Architecture Vector/Matrix Extension  CPU Cores per Node GPUs per Node Node Memory1 Default Runtime Max Runtime Max Nodes Min Nodes Max Simultaneous Jobs per User
debug-28core Intel Haswell AVX2 28 0 128 GB 1 hour 1 hour 8 n/a n/a
short-28core Intel Haswell AVX2 28 0 128 GB 1 hour 4 hours 12 n/a 8
medium-28core Intel Haswell AVX2 28 0 128 GB 4 hours 12 hours 24 8 2
long-28core Intel Haswell AVX2 28 0 128 GB 8 hours 48 hours 8 n/a 6
extended-28core Intel Haswell AVX2 28 0 128 GB 8 hours 7 days 2 n/a 6
large-28core Intel Haswell AVX2 28 0 128 GB 4 hours 8 hours 80 24 1
gpu Intel Haswell AVX2 28 4 128 GB 1 hour 8 hours 2 n/a 2
gpu-long Intel Haswell AVX2 28 4 128 GB 8 hours 48 hours 1 n/a 2
gpu-large Intel Haswell AVX2 28 4 128 GB 1 hour 8 hours 4 n/a 1
p100 Intel Haswell AVX2 12 2 64 GB 1 hour 24 hours 1 n/a 1
v100 Intel Haswell AVX2 28 2 128 GB 1 hour 24 hours 1 n/a 1

1A small subset of node memory is reserved for the OS and file system and is not available for user applications.

 

Queues accessed from milan1 and milan2:

Queue CPU Architecture Vector/Matrix Extension  CPU Cores per Node GPUs per Node Node Memory1 Default Runtime Max Runtime Max Nodes Min Nodes Max Simultaneous Jobs per User Multiple Users per Node
debug-40core Intel Skylake AVX512 40 0 192 GB 1 hour 1 hour 8 n/a n/a No
short-40core Intel Skylake AVX512 40 0 192 GB 1 hour 4 hours 8 n/a 4 No
short-40core-shared Intel Skylake AVX512 40 0 192 GB 1 hour 4 hours 4 n/a n/a Yes
medium-40core Intel Skylake AVX512 40 0 192 GB 4 hours 12 hours 16 6 1 No
long-40core Intel Skylake AVX512 40 0 192 GB 8 hours 48 hours 6 n/a 3 No
long-40core-shared Intel Skylake AVX512 40 0 192 GB 8 hours 24 hours 3 n/a n/a Yes
extended-40core Intel Skylake AVX512 40 0 192 GB 8 hours 7 days 2 n/a 3 No
extended-40core-shared Intel Skylake AVX512 40 0 192 GB 8 hours 3.5 days 1 n/a n/a Yes
large-40core Intel Skylake AVX512 40 0 192 GB 4 hours 8 hours 50 16 1 No
short-96core AMD EPYC Milan AVX2 96 0 256 GB 1 hour 4 hours 8 n/a 4 No
short-96core-shared AMD EPYC Milan AVX2 96 0 256 GB 1 hour 4 hours 4 n/a n/a Yes
medium-96core AMD EPYC Milan AVX2 96 0 256 GB 4 hours 12 hours 16 6 1 No
long-96core AMD EPYC Milan AVX2 96 0 256 GB 8 hours 48 hours 6 n/a 3 No
long-96core-shared AMD EPYC Milan AVX2 96 0 256 GB 8 hours 24 hours 3 n/a n/a Yes
extended-96core AMD EPYC Milan AVX2 96 0 256 GB 8 hours 7 days 2 n/a 3 No
extended-96core-shared AMD EPYC Milan AVX2 96 0 256 GB 8 hours 3.5 days 1 n/a n/a Yes
large-96core AMD EPYC Milan AVX2 96 0 256 GB 4 hours 8 hours 38 16 1 No
hbm-short-96core Intel Sapphire Rapids AMX, AVX512 & Intel DL Boost 96 0 384 GB (256GB DDR5 + 128GB HBM) 1 hour 4 hours 8 n/a 4 No
hbm-medium-96core Intel Sapphire Rapids AMX, AVX512 & Intel DL Boost 96 0 384 GB (256GB DDR5 + 128GB HBM) 4 hours 12 hours 16 6 1 No
hbm-long-96core Intel Sapphire Rapids AMX, AVX512 & Intel DL Boost 96 0 384 GB (256GB DDR5 + 128GB HBM) 8 hours 48 hours 6 n/a 3 No
hbm-1tb-long-96core Intel Sapphire Rapids AMX, AVX512 & Intel DL Boost 96 0 1000 GB (1 TB DDR5 + 128 GB HBM configured as level 4 cache) 8 hours 48 hours 1 n/a 1 No
hbm-extended-96core Intel Sapphire Rapids AMX, AVX512 & Intel DL Boost 96 0 384 GB (256GB DDR5 + 128GB HBM) 8 hours 7 days 2 n/a 3 No
hbm-large-96core Intel Sapphire Rapids AMX, AVX512 & Intel DL Boost 96 0 384 GB (256GB DDR5 + 128GB HBM) 4 hours 8 hours 38 16 1 No
a100 Intel Ice Lake AVX512 & Intel DL Boost 64 4 256 GB 1 hour 8 hours 2 n/a 2 Yes
a100-long Intel Ice Lake AVX512 & Intel DL Boost 64 4 256 GB 8 hours 48 hours 1 n/a 2 Yes
a100-large Intel Ice Lake AVX512 & Intel DL Boost 64 4 256 GB 1 hour 8 hours 4 n/a 1 Yes

1A small subset of node memory is reserved for the OS and file system and is not available for user applications.

 

In addition to the limits in the tables above, users cannot use more than 32 nodes at one time unless running jobs in one of the large queues, and the maximum number of jobs that a user can have queued at any given time is 100.
 

Hardware Configurations Across SeaWulf Queues


SeaWulf's queues offer a variety of hardware configurations tailored to different computational needs. Here’s a detailed breakdown of the hardware specifications across various queues:

  • The debug-28core, short-28core, long-28core, extended-28core, medium-28core, and large-28core queues share a set of identical nodes that have a max of 28 Haswell cores.
  • The debug-40core, short-40core, long-40core, extended-40core, medium-40core, and large-40core queues share a set of identical nodes that have 40 Skylake cores.
  • The short-96core, long-96core, extended-96core, medium-96core, and large-96core queues share a set of identical nodes that have 96 AMD EPYC Milan cores.
  • The short-96core-shared, long-96core-shared, and extended-96core-shared queues also share the same set of identical nodes that have 96 AMD EPYC Milan cores, but multiple jobs are allowed to run on the same node simultaneously.
  • The hbm-short-96core, hbm-long-96core, hbm-extended-96core, hbm-medium-96core, and hbm-large-96core queues share a set of identical nodes that have 96 Intel Sapphire Rapids cores.
  • The hbm-1tb-long-96core queue allocates jobs to 4 identical nodes that have 96 Intel Sapphire Rapids cores.  These nodes differ from the other hbm nodes in that they are configured in Cache mode and have 1 TB DDR5 memory.
  • The gpu and gpu-long queues share a third set of identical nodes that are similar to those used by the short, long, etc. queues but with 4x K80 24GB GPUs each.
  • The p100 and v100 queues each allocate jobs to a single node that has two Tesla P100 16GB or 2x V100 32GB GPUs, respectively.
  • The a100, a100-long, and a100-large queues have 4x A100 80GB GPUs and 64 cores of Intel Xeon Ice Lake CPUs.
     

Users must ensure that their applications are compatible with the specific hardware configurations available in each queue. This involves optimizing software usage to effectively utilize CPU architectures, GPU capabilities, and memory configurations.

 

 

Article Topic