SeaWulf's queues, also referred to as partitions, are designed to optimize both runtime and resource allocation across its computing nodes, accommodating a wide range of computational needs and hardware configurations. It's essential for users to align their job submissions with the capabilities of each partition, considering factors such as core counts, GPU specifications, and memory capacities, to ensure maximum cluster efficiency and effective utilization of allocated resources.
This KB Article References: High Performance Computing
Expect a Trade-off Between Resource Usage and Wait Times
Balancing resource requests (nodes and job duration) with queue wait times minimizes waste and delays. Users should carefully assess computational requirements to avoid underutilizing or overloading the system.
Optimize Resource Usage with Test Jobs
Not all applications benefit from using multiple nodes or all cores on a single node. Starting with smaller test jobs allows users to gauge the necessary computational resources accurately before scaling up to larger jobs.
Understand Hardware Specifics for Optimal Performance
SeaWulf offers queues with diverse CPU architectures (e.g., Haswell, Skylake, AMD EPYC Milan) and varying core counts. Understanding these specifics is crucial for matching the computational requirements of applications effectively. Users should select a queue that aligns with their application's CPU architecture requirements to achieve optimal performance.
GPU Usage: Maximize Efficiency
Queues equipped with GPUs should be used exclusively for applications that require GPU acceleration. Before submitting jobs, users must verify compatibility with the available GPU types and ensure that their software is configured to utilize GPUs effectively.
Consider Shared Queues for Smaller Jobs
SeaWulf offers "shared" queues such as 96-core or 40-core configurations where multiple jobs can run concurrently on the same node. These queues are ideal for jobs that require modest computational resources and maximize resource efficiency by utilizing node capacities fully.
Use Specialized Queues for Testing and Debugging
For tasks like interactive sessions, brief testing, or code debugging, SeaWulf provides specialized queues such as debug-28core or short queues with rapid turnaround times. These queues are designed to prioritize quick job execution, making them suitable for initial code development and testing phases where rapid feedback is essential.
Use Long Queues When Uncertain of Job Duration
If unsure about the runtime required for a job, opting for long queues initially allows flexibility. Users can assess actual job durations from test runs or previous executions and then adjust to more suitable queues accordingly.
Handle Memory-Intensive Jobs Appropriately
For applications demanding significant memory resources, SeaWulf provides specialized queues like hbm-1tb-long-96core, equipped with nodes featuring large memory capacities tailored for memory-intensive tasks.
Ensure Software/Hardware Compatibility
Ensuring software compatibility with the hardware configurations available in each queue is essential for maximizing job performance. Users should verify that their applications are configured to leverage the specific CPU architectures, core counts, and GPU types available in their chosen queue.
By following these guidelines, users effectively manage job submissions on SeaWulf, optimizing resource usage and minimizing queue wait times.
Available Queues
The full list of available queues will depend upon the type of login node you are submitting from. Specifically, there are two sets of login nodes: login1/login2 which provide access to one set of queues, and milan1/milan2 which provide access to another set of queues.
Queues accessed from login1 and login2:
Queue | CPU Architecture | Vector/Matrix Extension | CPU Cores per Node | GPUs per Node | Node Memory1 | Default Runtime | Max Runtime | Max Nodes | Min Nodes | Max Simultaneous Jobs per User |
---|---|---|---|---|---|---|---|---|---|---|
debug-28core | Intel Haswell | AVX2 | 28 | 0 | 128 GB | 1 hour | 1 hour | 8 | n/a | n/a |
short-28core | Intel Haswell | AVX2 | 28 | 0 | 128 GB | 1 hour | 4 hours | 12 | n/a | 8 |
medium-28core | Intel Haswell | AVX2 | 28 | 0 | 128 GB | 4 hours | 12 hours | 24 | 8 | 2 |
long-28core | Intel Haswell | AVX2 | 28 | 0 | 128 GB | 8 hours | 48 hours | 8 | n/a | 6 |
extended-28core | Intel Haswell | AVX2 | 28 | 0 | 128 GB | 8 hours | 7 days | 2 | n/a | 6 |
large-28core | Intel Haswell | AVX2 | 28 | 0 | 128 GB | 4 hours | 8 hours | 80 | 24 | 1 |
gpu | Intel Haswell | AVX2 | 28 | 4 | 128 GB | 1 hour | 8 hours | 2 | n/a | 2 |
gpu-long | Intel Haswell | AVX2 | 28 | 4 | 128 GB | 8 hours | 48 hours | 1 | n/a | 2 |
gpu-large | Intel Haswell | AVX2 | 28 | 4 | 128 GB | 1 hour | 8 hours | 4 | n/a | 1 |
p100 | Intel Haswell | AVX2 | 12 | 2 | 64 GB | 1 hour | 24 hours | 1 | n/a | 1 |
v100 | Intel Haswell | AVX2 | 28 | 2 | 128 GB | 1 hour | 24 hours | 1 | n/a | 1 |
1A small subset of node memory is reserved for the OS and file system and is not available for user applications.
Queues accessed from milan1 and milan2:
Queue | CPU Architecture | Vector/Matrix Extension | CPU Cores per Node | GPUs per Node | Node Memory1 | Default Runtime | Max Runtime | Max Nodes | Min Nodes | Max Simultaneous Jobs per User | Multiple Users per Node |
---|---|---|---|---|---|---|---|---|---|---|---|
debug-40core | Intel Skylake | AVX512 | 40 | 0 | 192 GB | 1 hour | 1 hour | 8 | n/a | n/a | No |
short-40core | Intel Skylake | AVX512 | 40 | 0 | 192 GB | 1 hour | 4 hours | 8 | n/a | 4 | No |
short-40core-shared | Intel Skylake | AVX512 | 40 | 0 | 192 GB | 1 hour | 4 hours | 4 | n/a | n/a | Yes |
medium-40core | Intel Skylake | AVX512 | 40 | 0 | 192 GB | 4 hours | 12 hours | 16 | 6 | 1 | No |
long-40core | Intel Skylake | AVX512 | 40 | 0 | 192 GB | 8 hours | 48 hours | 6 | n/a | 3 | No |
long-40core-shared | Intel Skylake | AVX512 | 40 | 0 | 192 GB | 8 hours | 24 hours | 3 | n/a | n/a | Yes |
extended-40core | Intel Skylake | AVX512 | 40 | 0 | 192 GB | 8 hours | 7 days | 2 | n/a | 3 | No |
extended-40core-shared | Intel Skylake | AVX512 | 40 | 0 | 192 GB | 8 hours | 3.5 days | 1 | n/a | n/a | Yes |
large-40core | Intel Skylake | AVX512 | 40 | 0 | 192 GB | 4 hours | 8 hours | 50 | 16 | 1 | No |
short-96core | AMD EPYC Milan | AVX2 | 96 | 0 | 256 GB | 1 hour | 4 hours | 8 | n/a | 4 | No |
short-96core-shared | AMD EPYC Milan | AVX2 | 96 | 0 | 256 GB | 1 hour | 4 hours | 4 | n/a | n/a | Yes |
medium-96core | AMD EPYC Milan | AVX2 | 96 | 0 | 256 GB | 4 hours | 12 hours | 16 | 6 | 1 | No |
long-96core | AMD EPYC Milan | AVX2 | 96 | 0 | 256 GB | 8 hours | 48 hours | 6 | n/a | 3 | No |
long-96core-shared | AMD EPYC Milan | AVX2 | 96 | 0 | 256 GB | 8 hours | 24 hours | 3 | n/a | n/a | Yes |
extended-96core | AMD EPYC Milan | AVX2 | 96 | 0 | 256 GB | 8 hours | 7 days | 2 | n/a | 3 | No |
extended-96core-shared | AMD EPYC Milan | AVX2 | 96 | 0 | 256 GB | 8 hours | 3.5 days | 1 | n/a | n/a | Yes |
large-96core | AMD EPYC Milan | AVX2 | 96 | 0 | 256 GB | 4 hours | 8 hours | 38 | 16 | 1 | No |
hbm-short-96core | Intel Sapphire Rapids | AMX, AVX512 & Intel DL Boost | 96 | 0 | 384 GB (256GB DDR5 + 128GB HBM) | 1 hour | 4 hours | 8 | n/a | 4 | No |
hbm-medium-96core | Intel Sapphire Rapids | AMX, AVX512 & Intel DL Boost | 96 | 0 | 384 GB (256GB DDR5 + 128GB HBM) | 4 hours | 12 hours | 16 | 6 | 1 | No |
hbm-long-96core | Intel Sapphire Rapids | AMX, AVX512 & Intel DL Boost | 96 | 0 | 384 GB (256GB DDR5 + 128GB HBM) | 8 hours | 48 hours | 6 | n/a | 3 | No |
hbm-1tb-long-96core | Intel Sapphire Rapids | AMX, AVX512 & Intel DL Boost | 96 | 0 | 1000 GB (1 TB DDR5 + 128 GB HBM configured as level 4 cache) | 8 hours | 48 hours | 1 | n/a | 1 | No |
hbm-extended-96core | Intel Sapphire Rapids | AMX, AVX512 & Intel DL Boost | 96 | 0 | 384 GB (256GB DDR5 + 128GB HBM) | 8 hours | 7 days | 2 | n/a | 3 | No |
hbm-large-96core | Intel Sapphire Rapids | AMX, AVX512 & Intel DL Boost | 96 | 0 | 384 GB (256GB DDR5 + 128GB HBM) | 4 hours | 8 hours | 38 | 16 | 1 | No |
a100 | Intel Ice Lake | AVX512 & Intel DL Boost | 64 | 4 | 256 GB | 1 hour | 8 hours | 2 | n/a | 2 | Yes |
a100-long | Intel Ice Lake | AVX512 & Intel DL Boost | 64 | 4 | 256 GB | 8 hours | 48 hours | 1 | n/a | 2 | Yes |
a100-large | Intel Ice Lake | AVX512 & Intel DL Boost | 64 | 4 | 256 GB | 1 hour | 8 hours | 4 | n/a | 1 | Yes |
1A small subset of node memory is reserved for the OS and file system and is not available for user applications.
In addition to the limits in the tables above, users cannot use more than 32 nodes at one time unless running jobs in one of the large queues, and the maximum number of jobs that a user can have queued at any given time is 100.
Hardware Configurations Across SeaWulf Queues
SeaWulf's queues offer a variety of hardware configurations tailored to different computational needs. Here’s a detailed breakdown of the hardware specifications across various queues:
- The debug-28core, short-28core, long-28core, extended-28core, medium-28core, and large-28core queues share a set of identical nodes that have a max of 28 Haswell cores.
- The debug-40core, short-40core, long-40core, extended-40core, medium-40core, and large-40core queues share a set of identical nodes that have 40 Skylake cores.
- The short-96core, long-96core, extended-96core, medium-96core, and large-96core queues share a set of identical nodes that have 96 AMD EPYC Milan cores.
- The short-96core-shared, long-96core-shared, and extended-96core-shared queues also share the same set of identical nodes that have 96 AMD EPYC Milan cores, but multiple jobs are allowed to run on the same node simultaneously.
- The hbm-short-96core, hbm-long-96core, hbm-extended-96core, hbm-medium-96core, and hbm-large-96core queues share a set of identical nodes that have 96 Intel Sapphire Rapids cores.
- The hbm-1tb-long-96core queue allocates jobs to 4 identical nodes that have 96 Intel Sapphire Rapids cores. These nodes differ from the other hbm nodes in that they are configured in Cache mode and have 1 TB DDR5 memory.
- The gpu and gpu-long queues share a third set of identical nodes that are similar to those used by the short, long, etc. queues but with 4x K80 24GB GPUs each.
- The p100 and v100 queues each allocate jobs to a single node that has two Tesla P100 16GB or 2x V100 32GB GPUs, respectively.
- The a100, a100-long, and a100-large queues have 4x A100 80GB GPUs and 64 cores of Intel Xeon Ice Lake CPUs.
Users must ensure that their applications are compatible with the specific hardware configurations available in each queue. This involves optimizing software usage to effectively utilize CPU architectures, GPU capabilities, and memory configurations.