SeaWulf Architecture Overview

SeaWulf Architecture Overview

SeaWulf is a heterogeneous cluster with over 400 nodes and 23,000 cores, designed to provide a range of computational resources for different research needs.

Hardware Generations and Access

SeaWulf's hardware spans multiple generations, accessible via different login nodes:

Legacy Platform (login1/login2)

The original SeaWulf hardware:

  • Haswell 28-core nodes: Mature, stable platform with AVX2 support
  • GPU acceleration: K80, P100, and V100 GPUs for older CUDA applications
  • Best for: Legacy software, budget-conscious computing, development work

Modern Platform (milan1/milan2)

Expanded infrastructure with newer hardware:

  • Multiple CPU architectures: Intel Skylake, AMD Milan, Intel Sapphire Rapids
  • Memory innovations: Standard DDR5, high-bandwidth HBM, and massive 1TB configurations
  • GPUs: NVIDIA A100 GPUs for more demanding CUDA applications
  • Specialized features: Shared access modes, high memory bandwidth
Note: Your login node choice determines available resources. See the SeaWulf Queue Table for a direct comparison.

Understanding Performance Characteristics

CPU Architecture Differences

Different CPU generations have distinct performance profiles:

Architecture Strengths Ideal Applications
Intel Haswell (28-core) Stable, widely compatible Legacy codes, development, general computing
Intel Skylake (40-core) Balanced performance, modern instruction sets Most scientific computing workloads
AMD Milan (96-core) High parallelism Highly parallel applications, parameter sweeps
Intel Sapphire Rapids (96-core) Advanced instruction sets, HBM memory AI/ML inference, memory-intensive applications

Memory

  • Standard DDR5: Balanced performance for most applications
  • High-Bandwidth Memory (HBM): Better for memory bandwidth-limited applications. For more information see HBM Nodes.
  • Large-memory nodes: Useful for in-memory processing of very large datasets

Shared vs Dedicated Access

SeaWulf through shared access modes:

Dedicated Access

  • Your job gets exclusive access to entire nodes
  • Guaranteed resources and performance
  • Best for large, resource-intensive applications
  • Higher resource cost per computation

Shared Access

  • Multiple users can run jobs on the same node
  • More efficient utilization of system resources
  • Ideal for smaller jobs that don't need full nodes
  • Faster queue times, lower resource cost

System Scalability

Supports a range of workloads:

  • Single-core tasks
  • Multi-threaded single-node jobs (up to 96 cores)
  • Multi-node distributed applications

The scheduling system automatically handles resource allocation and job placement, optimizing performance while maintaining fair access across all users.

Scaling Tip: Not all applications benefit from using more resources; use available tools to find optimal allocations.