SeaWulf is a heterogeneous high-performance computing cluster with over 400 nodes and approximately 23,000 cores, delivering around 1.86 PFLOP/s of peak computational performance. The system offers diverse CPU and GPU architectures for a wide range of research workloads.
This guide provides comprehensive information on SeaWulf's architecture, hardware specifications, and resource selection to help you make informed decisions about which resources to use for your research.
System Capacity
- Over 400 nodes
- Approximately 23,000 cores
- Peak performance: ~1.86 PFLOP/s
- Storage: 4 PB SAS disk + 50 TB SSD
- Network: High-speed InfiniBand® with 5-50 GB/s transfer speeds
Hardware Generations and Access
SeaWulf's hardware spans multiple generations, accessible via different login nodes. Your choice of login node determines which hardware resources are available to your jobs.
Legacy Platform (login1/login2)
The original SeaWulf hardware generation:
- Haswell 28-core nodes: Mature, stable platform with AVX2 instruction set support
- GPU acceleration: K80, P100, and V100 GPUs for CUDA applications
- Best for: Legacy software, budget-conscious computing, development and testing
Modern Platform (milan1/milan2)
Expanded infrastructure with newer hardware generations:
- Multiple CPU architectures: Intel Skylake (40-core), AMD Milan (96-core), Intel Sapphire Rapids (96-core)
- Memory innovations: Standard DDR5, high-bandwidth HBM, and massive 1TB memory configurations
- GPU acceleration: NVIDIA A100 80GB GPUs for demanding AI/ML and HPC applications
- Advanced features: Shared access modes, enhanced instruction sets (AVX512, AMX)
Important: Your login node choice determines available resources. Login to login1 or login2 for legacy platform access, or milan1 or milan2 for modern platform access. See the SeaWulf Queues Table for a complete comparison.
CPU Architectures
Different CPU generations offer distinct performance characteristics:
| Architecture | Cores/Node | Key Strengths | Ideal Applications |
|---|---|---|---|
| Intel Haswell | 28 | Stable, mature platform with AVX2 | Legacy software, general computing, development |
| Intel Skylake | 40 | Balanced performance with AVX512 | Most scientific computing, production workloads |
| AMD Milan | 96 | High core count, excellent for parallel workloads | Highly parallel applications, parameter sweeps |
| Intel Sapphire Rapids | 96 | AVX512, AMX, high-bandwidth memory options | AI/ML inference, memory-intensive workloads |
Instruction Set Extensions: All architectures support AVX2 (256-bit vector operations). Skylake and Sapphire Rapids additionally support AVX512 (512-bit vector operations). Sapphire Rapids includes AMX (Advanced Matrix Extensions) for hardware-accelerated matrix operations, particularly beneficial for AI/ML workloads.
Detailed Node Specifications
Complete breakdown of all compute, login, and large-memory nodes:
| Name | Node type | Node Count | Core Manufacturer | Core Type | Cores per node | Total Cores | Memory¹ | GPU | GPUs per node | Total GPUs |
|---|---|---|---|---|---|---|---|---|---|---|
| login1 | login | 1 | Intel | Haswell | 24 | 24 | 256 GB | N/A | 0 | 0 |
| login2 | login | 1 | Intel | Haswell | 20 | 20 | 256 GB | N/A | 0 | 0 |
| sn[001-056] | CPU compute | 156 | Intel | Haswell | 28 | 4,368 | 128 GB | N/A | 0 | 0 |
| cn-nvidia | GPU compute | 1 | Intel | Haswell | 12 | 12 | 64 GB | Nvidia P100 16GB | 2 | 2 |
| sn-nvda[1,2] | GPU compute | 2 | Intel | Haswell | 28 | 56 | 128 GB | Nvidia V100 32GB | 2 | 4 |
| sn-nvda[3-8] | GPU compute | 6 | Intel | Haswell | 28 | 168 | 128 GB | Nvidia K80 | 4 | 24 |
| milan[1,2] | login | 2 | AMD | Milan | 64 | 128 | 512 GB | N/A | 0 | 0 |
| xeonmax | login | 1 | Intel | Sapphire Rapids | 96 | 96 | 512 GB | N/A | 0 | 0 |
| dg[001-48] | CPU compute | 48 | AMD | Milan | 96 | 4,608 | 256 GB | N/A | 0 | 0 |
| xm[001-044,049-094] | CPU compute | 90 | Intel | Sapphire Rapids | 96 | 8,640 | 128 GB HBM + 256 GB DDR5 | N/A | 0 | 0 |
| xm[045-048] | CPU compute | 4 | Intel | Sapphire Rapids | 96 | 384 | 1 TB DDR5 + 128 GB HBM (cache mode) | N/A | 0 | 0 |
| dn[001-064] | CPU compute | 64 | Intel | Skylake | 40 | 2,560 | 192 GB | N/A | 0 | 0 |
| a100-[01-11] | GPU compute | 11 | Intel | Ice Lake | 64 | 704 | 256 GB | Nvidia A100 80GB | 4 | 44 |
| dg-mem | large memory | 1 | Intel | Cooper Lake | 96 | 96 | 3 TB | AMD MI210 | 2 | 2 |
| cn-mem | large memory | 1 | Intel | Haswell | 72 | 72 | 3 TB | Nvidia V100 16GB | 1 | 1 |
¹A small subset of node memory is reserved for the OS and file system and is not available for user applications.
Compute Nodes by Vendor
SeaWulf's compute infrastructure includes nodes from multiple leading vendors:
- 164 Compute Nodes from Penguin Computing: Dual Intel Xeon Haswell CPUs (28 cores/node, 2.0 GHz), 128 GB DDR4, FDR InfiniBand. Includes 8 GPU nodes with K80, P100, and V100 GPUs.
- 64 Compute Nodes from Dell: Dual Intel Xeon Gold 6148 Skylake CPUs (40 cores/node, 2.4 GHz), 192 GB RAM, FDR InfiniBand.
- 48 Compute Nodes from HPE: Dual AMD EPYC 7643 Milan CPUs (96 cores/node, 3.2 GHz), 256 GB RAM, HDR100 InfiniBand.
- 11 GPU Compute Nodes from Dell: Dual Intel Xeon 6338 Ice Lake CPUs (64 cores/node, 2.0 GHz) with 4× Nvidia A100 80GB GPUs per node, 256 GB RAM, HDR100 InfiniBand.
- 94 Compute Nodes from HPE: Dual Intel Xeon Max 9468 Sapphire Rapids CPUs (96 cores/node, 2.6 GHz) with either 384 GB (256 GB DDR5 + 128 GB HBM) or 1 TB DDR5 + 128 GB HBM cache configurations, NDR InfiniBand.
Memory Architecture
SeaWulf offers different memory configurations optimized for various computational needs:
Standard Memory (DDR4/DDR5)
- Capacity: 128 GB to 256 GB per node (standard compute nodes)
- Characteristics: Balanced performance suitable for most applications
- Use when: Your application has typical memory requirements and doesn't need specialized memory features
High-Bandwidth Memory (HBM)
- Capacity: 128 GB HBM + 256 GB DDR5 per node (384 GB total)
- Characteristics: 2-4× higher memory bandwidth compared to standard DDR5
- Use when: Your application is limited by memory bandwidth rather than compute capacity
- Learn more: See HBM Nodes Guide for detailed information
Ultra-Large Memory Configuration
- Capacity: 1 TB DDR5 + 128 GB HBM configured as level 4 cache (4 nodes available)
- Characteristics: Combines massive capacity with high-bandwidth HBM cache layer
- Use when: You need both extremely large memory capacity and high bandwidth
- Access: Available via the hbm-1tb-long-96core queue
GPU Resources
SeaWulf provides GPU acceleration across multiple generations of NVIDIA hardware:
GPU Generations Available
| GPU Model | Memory | Platform | Typical Use Cases |
|---|---|---|---|
| NVIDIA K80 | 24 GB | Legacy (login1/2) | Basic GPU computing, older CUDA applications |
| NVIDIA P100 | 16 GB | Legacy (login1/2) | Scientific computing, double-precision workloads |
| NVIDIA V100 | 32 GB | Legacy (login1/2) | Deep learning, HPC applications |
| NVIDIA A100 | 80 GB | Modern (milan1/2) | Large-scale AI/ML, demanding HPC workloads |
GPU Node Configuration
- K80 nodes: 4 GPUs per node (6 nodes available)
- P100 nodes: 2 GPUs per node (1 node available)
- V100 nodes: 2 GPUs per node (2 nodes available)
- A100 nodes: 4 GPUs per node (11 nodes available)
Network and Storage Infrastructure
High-Speed InfiniBand® Networking
SeaWulf employs multiple generations of InfiniBand® networking from Nvidia®:
- Login Nodes: Five login nodes for user access and job submission
- Network Types:
- FDR InfiniBand (Haswell and Skylake nodes)
- HDR100 InfiniBand (Milan and A100 nodes)
- NDR InfiniBand (Sapphire Rapids nodes)
- Transfer Speeds: 5-50 GB/s depending on network generation
The InfiniBand network enables low-latency, high-bandwidth communication between nodes, essential for large-scale parallel computing applications.
Storage System
SeaWulf's storage infrastructure is built on IBM's GPFS (General Parallel File System) solution:
Storage Capacity
- Total Capacity: Approximately 4 petabytes (PB) of SAS spinning disk
- High-Performance Tier: 50 terabytes (TB) of SSD storage
Performance Metrics
- Random Read Performance: Over 4 million 4K IOPS sustained
- Sequential Read Performance: Exceeding 36 GB/s
This tiered storage architecture provides both high capacity for large datasets and exceptional performance for I/O-intensive applications, with consistent file access across all compute and login nodes.
Performance Calculations
Technical formulas for calculating system performance metrics:
Peak Computational Performance
Formula: Cores per node × Number of nodes × Base clock per core × Double precision FLOPS per cycle + Nvidia GPU Teraflops
This calculation yields SeaWulf's theoretical peak performance of approximately 1.86 PFLOP/s.
Memory Bandwidth
Formula: Memory Clock (MHz) × 2 (DDR) × 64-bit Memory Bus width × 4 Memory interfaces per CPU (Quad-channel) × 2 CPUs per node ÷ 8 bits per byte
Example (Haswell nodes): 1,066 MHz × 2 × 64 × 4 × 2 ÷ 8 = 133.25 GB/s per node
Related Documentation
- Queue Selection Guide - Learn how to choose the right queue for your jobs
- SeaWulf Queues Table - Complete reference of all available queues
- HBM Nodes Guide - Detailed information on high-bandwidth memory systems
- SLURM Overview - Learn about the job scheduling system
