SeaWulf Queue Selection Guide

SeaWulf Queue Selection Guide

Strategy

  • Start Small: Begin with test jobs to determine actual resource needs
  • Match Resources to Requirements: Request only the cores and memory your job will use
  • Consider Wait Times: Smaller requests often have shorter queue times
  • Use Shared Queues: For jobs that don’t need a full node
  • Verify Compatibility: Ensure software works with the target CPU/GPU architecture
Tip: Effective queue selection balances your job needs with system efficiency. Memory only needs to be specified for shared queues; otherwise, all available memory is requested by default.

Legacy Queues (login1/login2)

28-Core Intel Haswell Nodes

AVX2 support and 128 GB memory per node.

Queue Duration Max Nodes Shared Best For
debug-28core 1 hr 8 No Testing/debugging
short-28core 1-4 hr 12 No Small/medium jobs
medium-28core 4-12 hr 24 No Medium parallel jobs
long-28core 8-48 hr 8 No Long simulations
extended-28core 8 hr - 7 days 2 No Very long simulations
large-28core 4-8 hr 80 No Large parallel jobs

GPU Options

Queue GPU Memory Duration Best For
gpu 4× K80 24 GB 1-8 hr Basic GPU workloads
gpu-long 4× K80 24 GB 8-48 hr Long GPU jobs
p100 2× P100 16 GB 1-24 hr Scientific computing
v100 2× V100 32 GB 1-24 hr ML/AI

Modern Queues (milan1/milan2)

40-Core Intel Skylake Nodes

AVX512 support and 192 GB memory per node.

Queue Duration Max Nodes Shared Best For
debug-40core 1 hr 8 No Testing/debugging
short-40core 1-4 hr 8 No Standard jobs
short-40core-shared 1-4 hr 4 Yes Smaller jobs
medium-40core 4-12 hr 16 No Medium jobs
long-40core 8-48 hr 6 No Long jobs
long-40core-shared 8-24 hr 3 Yes Shared long jobs
extended-40core 8 hr - 7 days 2 No Very long jobs
extended-40core-shared 8 hr - 3.5 days 1 Yes Shared extended jobs
large-40core 4-8 hr 50 No Large parallel jobs

96-Core AMD EPYC Milan Nodes

AVX2 support and 256 GB memory per node.

Queue Duration Max Nodes Shared Best For
short-96core 1-4 hr 8 No Parallel jobs
short-96core-shared 1-4 hr 4 Yes Moderate shared jobs
medium-96core 4-12 hr 16 No Parameter sweeps
long-96core 8-48 hr 6 No Long parallel jobs
long-96core-shared 8-24 hr 3 Yes Shared long jobs
extended-96core 8 hr - 7 days 2 No Very long jobs
extended-96core-shared 8 hr - 3.5 days 1 Yes Shared extended jobs
large-96core 4-8 hr 38 No Large parallel jobs

High-Bandwidth Memory (HBM) Nodes

AMX/AVX512 support and 384 GB memory per node (256 GB DDR5 + 128 GB HBM).

Queue Duration Max Nodes Special Features Best For
hbm-short-96core 1-4 hr 8 High-bandwidth memory Memory-intensive jobs
hbm-medium-96core 4-12 hr 16 Enhanced memory performance Large datasets
hbm-long-96core 8-48 hr 6 2-4x memory speed Memory-bound simulations
hbm-extended-96core 8 hr - 7 days 2 Maximum duration Long memory jobs
hbm-large-96core 4-8 hr 38 Large-scale HBM computing Massive memory-bound jobs
hbm-1tb-long-96core 8-48 hr 1 1 TB memory + 128 GB HBM cache Extremely large datasets
HBM Note: Use HBM nodes for workloads limited by memory bandwidth.

NVIDIA A100 GPU Nodes

4× A100 80 GB GPUs with 64-core Intel Ice Lake nodes (256 GB memory).

Queue Duration Max Nodes Shared Best For
a100 1-8 hr 2 Yes GPU workloads, AI/ML
a100-long 8-48 hr 1 Yes Long GPU jobs
a100-large 1-8 hr 4 Yes Large GPU jobs
GPU Note: Use GPU queues only for compatible applications. Verify CUDA/software requirements before submission.