SeaWulf Queues

SeaWulf Queues Guide

Queue Selection Strategy

  • Start Small: Begin with test jobs to determine actual resource needs
  • Match Resources to Requirements: Don't over-request cores or memory you won't use
  • Consider Wait Times: Smaller resource requests often have shorter queue times
  • Use Shared Queues: For jobs that don't need an entire node
  • Verify Compatibility: Ensure your software works with the specific hardware
Pro Tip: Balance resource requests with wait times. Requesting exactly what you need minimizes both waste and delays.

Legacy Queues (login1/login2 access)

28-Core Intel Haswell Nodes

Established, reliable computing platform with AVX2 support and 128 GB memory per node.

Queue Duration Max Nodes Special Features Best For
debug-28core 1 hour max 8 Quick turnaround Testing and debugging
short-28core 1-4 hours 12 Fast access Small to medium jobs
medium-28core 4-12 hours 24 Min 8 nodes Medium parallel jobs
long-28core 8-48 hours 8 Extended runtime Long-running simulations
extended-28core 8 hours - 7 days 2 Maximum duration Very long simulations
large-28core 4-8 hours 80 Min 24 nodes Large-scale parallel computing

GPU Options (login1/login2)

Queue GPU Type GPU Memory Duration Best For
gpu 4x K80 24GB each 1-8 hours GPU computing, basic ML
gpu-long 4x K80 24GB each 8-48 hours Long GPU computations
p100 2x Tesla P100 16GB each 1-24 hours Scientific computing
v100 2x V100 32GB each 1-24 hours Deep learning, AI research

Modern Queues (milan1/milan2 access)

40-Core Intel Skylake Nodes

Modern computing platform with AVX512 support and 192 GB memory per node.

Queue Duration Max Nodes Shared Access Best For
debug-40core 1 hour max 8 No Testing and debugging
short-40core 1-4 hours 8 No Standard HPC jobs
short-40core-shared 1-4 hours 4 Yes Efficient resource use for smaller jobs
medium-40core 4-12 hours 16 No Medium-duration parallel jobs
long-40core 8-48 hours 6 No Long simulations
long-40core-shared 8-24 hours 3 Yes Cost-effective long jobs
extended-40core 8 hours - 7 days 2 No Very long computations
extended-40core-shared 8 hours - 3.5 days 1 Yes Extended shared access
large-40core 4-8 hours 50 No Large-scale parallel jobs

96-Core AMD EPYC Milan Nodes

High-density computing platform with 256 GB memory per node, ideal for massively parallel workloads.

Queue Duration Max Nodes Shared Access Best For
short-96core 1-4 hours 8 No High-throughput computing
short-96core-shared 1-4 hours 4 Yes Parallel jobs with moderate resource needs
medium-96core 4-12 hours 16 No Parameter sweeps, Monte Carlo
long-96core 8-48 hours 6 No Long parallel computations
long-96core-shared 8-24 hours 3 Yes Efficient long-running jobs
extended-96core 8 hours - 7 days 2 No Very long parallel simulations
extended-96core-shared 8 hours - 3.5 days 1 Yes Extended shared parallel work
large-96core 4-8 hours 38 No Massive parallel computing

High-Bandwidth Memory (HBM) Queues

Intel Sapphire Rapids with Revolutionary Memory Architecture

Cutting-edge nodes featuring 384 GB memory (256GB DDR5 + 128GB HBM) with AMX, AVX512, and Intel DL Boost capabilities.

Queue Duration Max Nodes Special Features Best For
hbm-short-96core 1-4 hours 8 High-bandwidth memory Memory-intensive applications
hbm-medium-96core 4-12 hours 16 Enhanced memory performance Large dataset analysis
hbm-long-96core 8-48 hours 6 2-4x memory speed improvement Memory-bound simulations
hbm-extended-96core 8 hours - 7 days 2 Maximum duration with HBM Long memory-intensive jobs
hbm-large-96core 4-8 hours 38 Large-scale HBM computing Massive memory-bound parallel jobs
hbm-1tb-long-96core 8-48 hours 1 1TB memory + 128GB HBM cache Extremely large datasets in memory
HBM Advantage: High-bandwidth memory provides 2-4x faster memory access for memory-bound applications, ideal for large-scale simulations and data analytics.

Modern GPU Queues

NVIDIA A100 GPU Nodes

State-of-the-art GPU computing with 4x A100 80GB GPUs and 64 Intel Ice Lake cores (256 GB memory).

Queue Duration Max Nodes Shared Access Best For
a100 1-8 hours 2 Yes AI/ML training, deep learning
a100-long 8-48 hours 1 Yes Extended GPU computations
a100-large 1-8 hours 4 Yes Large-scale GPU parallel computing
GPU Usage Guidelines: Only use GPU queues for applications that can effectively utilize GPU acceleration. Verify software compatibility with CUDA and ensure proper GPU programming before submission.

Queue Selection Decision Tree

Choose the Right Queue for Your Work

If You Need... Consider... Why
Quick testing/debugging debug-* queues Fastest turnaround, 1-hour limit
GPU acceleration a100, gpu, p100, v100 Specialized hardware for parallel computing
Large memory requirements hbm-* queues, 96-core nodes High memory capacity and bandwidth
Maximum parallel processing 96-core AMD nodes 96 cores per node for throughput computing
Cost-effective computing shared queues Multiple users per node, shorter wait times
Very long simulations extended-* queues Up to 7 days runtime
Many nodes working together large-* queues Access to 16+ nodes simultaneously
Proven, stable platform 28-core Haswell nodes Mature hardware for production workflows

Resource Limits and Best Practices

System-Wide Limits

  • Maximum simultaneous nodes: 32 (except in large queues)
  • Maximum queued jobs per user: 100
  • Memory reservation: Small amount reserved for OS (not available to applications)

Optimization Tips

Shared Queue Benefits: Use shared queues for jobs that don't need an entire node. This often results in shorter wait times and more efficient resource utilization.
  • Test First: Start with debug or short queues to determine optimal resources
  • Right-Size Requests: Don't request more cores or memory than your application can use
  • Consider Architecture: Match your software's optimization to the CPU architecture (AVX2 vs AVX512)
  • GPU Efficiency: Only use GPU queues for GPU-accelerated applications
  • Memory Planning: Use HBM queues for memory-intensive workloads
  • Duration Flexibility: If uncertain about runtime, start with longer queues and adjust based on actual performance
Remember: Effective queue selection balances your computational needs with system efficiency. The goal is to get your work done quickly while making the best use of shared resources.