SeaWulf Architecture Overview

SeaWulf Architecture Overview

SeaWulf is a heterogeneous high-performance computing cluster with over 400 nodes and approximately 23,000 cores, delivering around 1.86 PFLOP/s of peak computational performance. The system offers diverse CPU and GPU architectures for a wide range of research workloads.

This guide provides comprehensive information on SeaWulf's architecture, hardware specifications, and resource selection to help you make informed decisions about which resources to use for your research.

System Capacity

  • Over 400 nodes
  • Approximately 23,000 cores
  • Peak performance: ~1.86 PFLOP/s
  • Storage: 4 PB SAS disk + 50 TB SSD
  • Network: High-speed InfiniBand® with 5-50 GB/s transfer speeds

Hardware Generations and Access

SeaWulf's hardware spans multiple generations, accessible via different login nodes. Your choice of login node determines which hardware resources are available to your jobs.

Legacy Platform (login1/login2)

The original SeaWulf hardware generation:

  • Haswell 28-core nodes: Mature, stable platform with AVX2 instruction set support
  • GPU acceleration: K80, P100, and V100 GPUs for CUDA applications
  • Best for: Legacy software, budget-conscious computing, development and testing

Modern Platform (milan1/milan2)

Expanded infrastructure with newer hardware generations:

  • Multiple CPU architectures: Intel Skylake (40-core), AMD Milan (96-core), Intel Sapphire Rapids (96-core)
  • Memory innovations: Standard DDR5, high-bandwidth HBM, and massive 1TB memory configurations
  • GPU acceleration: NVIDIA A100 80GB GPUs for demanding AI/ML and HPC applications
  • Advanced features: Shared access modes, enhanced instruction sets (AVX512, AMX)

Important: Your login node choice determines available resources. Login to login1 or login2 for legacy platform access, or milan1 or milan2 for modern platform access. See the SeaWulf Queues Table for a complete comparison.

CPU Architectures

Different CPU generations offer distinct performance characteristics:

Architecture Cores/Node Key Strengths Ideal Applications
Intel Haswell 28 Stable, mature platform with AVX2 Legacy software, general computing, development
Intel Skylake 40 Balanced performance with AVX512 Most scientific computing, production workloads
AMD Milan 96 High core count, excellent for parallel workloads Highly parallel applications, parameter sweeps
Intel Sapphire Rapids 96 AVX512, AMX, high-bandwidth memory options AI/ML inference, memory-intensive workloads

Instruction Set Extensions: All architectures support AVX2 (256-bit vector operations). Skylake and Sapphire Rapids additionally support AVX512 (512-bit vector operations). Sapphire Rapids includes AMX (Advanced Matrix Extensions) for hardware-accelerated matrix operations, particularly beneficial for AI/ML workloads.

Detailed Node Specifications

Complete breakdown of all compute, login, and large-memory nodes:

Name Node type Node Count Core Manufacturer Core Type Cores per node Total Cores Memory¹ GPU GPUs per node Total GPUs
login1 login 1 Intel Haswell 24 24 256 GB N/A 0 0
login2 login 1 Intel Haswell 20 20 256 GB N/A 0 0
sn[001-056] CPU compute 156 Intel Haswell 28 4,368 128 GB N/A 0 0
cn-nvidia GPU compute 1 Intel Haswell 12 12 64 GB Nvidia P100 16GB 2 2
sn-nvda[1,2] GPU compute 2 Intel Haswell 28 56 128 GB Nvidia V100 32GB 2 4
sn-nvda[3-8] GPU compute 6 Intel Haswell 28 168 128 GB Nvidia K80 4 24
 
milan[1,2] login 2 AMD Milan 64 128 512 GB N/A 0 0
xeonmax login 1 Intel Sapphire Rapids 96 96 512 GB N/A 0 0
dg[001-48] CPU compute 48 AMD Milan 96 4,608 256 GB N/A 0 0
xm[001-044,049-094] CPU compute 90 Intel Sapphire Rapids 96 8,640 128 GB HBM + 256 GB DDR5 N/A 0 0
xm[045-048] CPU compute 4 Intel Sapphire Rapids 96 384 1 TB DDR5 + 128 GB HBM (cache mode) N/A 0 0
dn[001-064] CPU compute 64 Intel Skylake 40 2,560 192 GB N/A 0 0
a100-[01-11] GPU compute 11 Intel Ice Lake 64 704 256 GB Nvidia A100 80GB 4 44
 
dg-mem large memory 1 Intel Cooper Lake 96 96 3 TB AMD MI210 2 2
cn-mem large memory 1 Intel Haswell 72 72 3 TB Nvidia V100 16GB 1 1

¹A small subset of node memory is reserved for the OS and file system and is not available for user applications.

Compute Nodes by Vendor

SeaWulf's compute infrastructure includes nodes from multiple leading vendors:

  • 164 Compute Nodes from Penguin Computing: Dual Intel Xeon Haswell CPUs (28 cores/node, 2.0 GHz), 128 GB DDR4, FDR InfiniBand. Includes 8 GPU nodes with K80, P100, and V100 GPUs.
  • 64 Compute Nodes from Dell: Dual Intel Xeon Gold 6148 Skylake CPUs (40 cores/node, 2.4 GHz), 192 GB RAM, FDR InfiniBand.
  • 48 Compute Nodes from HPE: Dual AMD EPYC 7643 Milan CPUs (96 cores/node, 3.2 GHz), 256 GB RAM, HDR100 InfiniBand.
  • 11 GPU Compute Nodes from Dell: Dual Intel Xeon 6338 Ice Lake CPUs (64 cores/node, 2.0 GHz) with 4× Nvidia A100 80GB GPUs per node, 256 GB RAM, HDR100 InfiniBand.
  • 94 Compute Nodes from HPE: Dual Intel Xeon Max 9468 Sapphire Rapids CPUs (96 cores/node, 2.6 GHz) with either 384 GB (256 GB DDR5 + 128 GB HBM) or 1 TB DDR5 + 128 GB HBM cache configurations, NDR InfiniBand.

Memory Architecture

SeaWulf offers different memory configurations optimized for various computational needs:

Standard Memory (DDR4/DDR5)

  • Capacity: 128 GB to 256 GB per node (standard compute nodes)
  • Characteristics: Balanced performance suitable for most applications
  • Use when: Your application has typical memory requirements and doesn't need specialized memory features

High-Bandwidth Memory (HBM)

  • Capacity: 128 GB HBM + 256 GB DDR5 per node (384 GB total)
  • Characteristics: 2-4× higher memory bandwidth compared to standard DDR5
  • Use when: Your application is limited by memory bandwidth rather than compute capacity
  • Learn more: See HBM Nodes Guide for detailed information

Ultra-Large Memory Configuration

  • Capacity: 1 TB DDR5 + 128 GB HBM configured as level 4 cache (4 nodes available)
  • Characteristics: Combines massive capacity with high-bandwidth HBM cache layer
  • Use when: You need both extremely large memory capacity and high bandwidth
  • Access: Available via the hbm-1tb-long-96core queue

GPU Resources

SeaWulf provides GPU acceleration across multiple generations of NVIDIA hardware:

GPU Generations Available

GPU Model Memory Platform Typical Use Cases
NVIDIA K80 24 GB Legacy (login1/2) Basic GPU computing, older CUDA applications
NVIDIA P100 16 GB Legacy (login1/2) Scientific computing, double-precision workloads
NVIDIA V100 32 GB Legacy (login1/2) Deep learning, HPC applications
NVIDIA A100 80 GB Modern (milan1/2) Large-scale AI/ML, demanding HPC workloads

GPU Node Configuration

  • K80 nodes: 4 GPUs per node (6 nodes available)
  • P100 nodes: 2 GPUs per node (1 node available)
  • V100 nodes: 2 GPUs per node (2 nodes available)
  • A100 nodes: 4 GPUs per node (11 nodes available)

Network and Storage Infrastructure

High-Speed InfiniBand® Networking

SeaWulf employs multiple generations of InfiniBand® networking from Nvidia®:

  • Login Nodes: Five login nodes for user access and job submission
  • Network Types:
    • FDR InfiniBand (Haswell and Skylake nodes)
    • HDR100 InfiniBand (Milan and A100 nodes)
    • NDR InfiniBand (Sapphire Rapids nodes)
  • Transfer Speeds: 5-50 GB/s depending on network generation

The InfiniBand network enables low-latency, high-bandwidth communication between nodes, essential for large-scale parallel computing applications.

Storage System

SeaWulf's storage infrastructure is built on IBM's GPFS (General Parallel File System) solution:

Storage Capacity

  • Total Capacity: Approximately 4 petabytes (PB) of SAS spinning disk
  • High-Performance Tier: 50 terabytes (TB) of SSD storage

Performance Metrics

  • Random Read Performance: Over 4 million 4K IOPS sustained
  • Sequential Read Performance: Exceeding 36 GB/s

This tiered storage architecture provides both high capacity for large datasets and exceptional performance for I/O-intensive applications, with consistent file access across all compute and login nodes.

Performance Calculations

Technical formulas for calculating system performance metrics:

Peak Computational Performance

Formula: Cores per node × Number of nodes × Base clock per core × Double precision FLOPS per cycle + Nvidia GPU Teraflops

This calculation yields SeaWulf's theoretical peak performance of approximately 1.86 PFLOP/s.

Memory Bandwidth

Formula: Memory Clock (MHz) × 2 (DDR) × 64-bit Memory Bus width × 4 Memory interfaces per CPU (Quad-channel) × 2 CPUs per node ÷ 8 bits per byte

Example (Haswell nodes): 1,066 MHz × 2 × 64 × 4 × 2 ÷ 8 = 133.25 GB/s per node

Related Documentation