SeaWulf Architecture Overview

SeaWulf is a heterogeneous high-performance computing cluster with over 400 nodes and approximately 23,000 cores, delivering around 1.86 PFLOP/s of peak computational performance. The system offers diverse CPU and GPU architectures for a wide range of research workloads.

This guide provides comprehensive information on SeaWulf's architecture, hardware specifications, and resource selection to help you make informed decisions about which resources to use for your research.

System Capacity

Over 400 nodes
Approximately 23,000 cores
Peak performance: ~1.86 PFLOP/s
Storage: 4 PB SAS disk + 50 TB SSD
Network: High-speed InfiniBand® with 5-50 GB/s transfer speeds

Hardware Generations and Access

SeaWulf's hardware spans multiple generations, accessible via different login nodes. Your choice of login node determines which hardware resources are available to your jobs.

Legacy Platform (login1/login2)

The original SeaWulf hardware generation:

Haswell 28-core nodes: Mature, stable platform with AVX2 instruction set support
GPU acceleration: K80, P100, and V100 GPUs for CUDA applications
Best for: Legacy software, budget-conscious computing, development and testing

Modern Platform (milan1/milan2)

Expanded infrastructure with newer hardware generations:

Multiple CPU architectures: Intel Skylake (40-core), AMD Milan (96-core), Intel Sapphire Rapids (96-core)
Memory innovations: Standard DDR5, high-bandwidth HBM, and massive 1TB memory configurations
GPU acceleration: NVIDIA A100 80GB GPUs for demanding AI/ML and HPC applications
Advanced features: Shared access modes, enhanced instruction sets (AVX512, AMX)

Important: Your login node choice determines available resources. Login to login1 or login2 for legacy platform access, or milan1 or milan2 for modern platform access. See the SeaWulf Queues Table for a complete comparison.

CPU Architectures

Different CPU generations offer distinct performance characteristics:

Architecture	Cores/Node	Key Strengths	Ideal Applications
Intel Haswell	28	Stable, mature platform with AVX2	Legacy software, general computing, development
Intel Skylake	40	Balanced performance with AVX512	Most scientific computing, production workloads
AMD Milan	96	High core count, excellent for parallel workloads	Highly parallel applications, parameter sweeps
Intel Sapphire Rapids	96	AVX512, AMX, high-bandwidth memory options	AI/ML inference, memory-intensive workloads

Instruction Set Extensions: All architectures support AVX2 (256-bit vector operations). Skylake and Sapphire Rapids additionally support AVX512 (512-bit vector operations). Sapphire Rapids includes AMX (Advanced Matrix Extensions) for hardware-accelerated matrix operations, particularly beneficial for AI/ML workloads.

Detailed Node Specifications

Complete breakdown of all compute, login, and large-memory nodes:

Name	Node type	Node Count	Core Manufacturer	Core Type	Cores per node	Total Cores	Memory¹	GPU	GPUs per node	Total GPUs
login1	login	1	Intel	Haswell	24	24	256 GB	N/A	0	0
login2	login	1	Intel	Haswell	20	20	256 GB	N/A	0	0
sn[001-056]	CPU compute	156	Intel	Haswell	28	4,368	128 GB	N/A	0	0
cn-nvidia	GPU compute	1	Intel	Haswell	12	12	64 GB	Nvidia P100 16GB	2	2
sn-nvda[1,2]	GPU compute	2	Intel	Haswell	28	56	128 GB	Nvidia V100 32GB	2	4
sn-nvda[3-8]	GPU compute	6	Intel	Haswell	28	168	128 GB	Nvidia K80	4	24

milan[1,2]	login	2	AMD	Milan	64	128	512 GB	N/A	0	0
xeonmax	login	1	Intel	Sapphire Rapids	96	96	512 GB	N/A	0	0
dg[001-48]	CPU compute	48	AMD	Milan	96	4,608	256 GB	N/A	0	0
xm[001-044,049-094]	CPU compute	90	Intel	Sapphire Rapids	96	8,640	128 GB HBM + 256 GB DDR5	N/A	0	0
xm[045-048]	CPU compute	4	Intel	Sapphire Rapids	96	384	1 TB DDR5 + 128 GB HBM (cache mode)	N/A	0	0
dn[001-064]	CPU compute	64	Intel	Skylake	40	2,560	192 GB	N/A	0	0
a100-[01-11]	GPU compute	11	Intel	Ice Lake	64	704	256 GB	Nvidia A100 80GB	4	44

dg-mem	large memory	1	Intel	Cooper Lake	96	96	3 TB	AMD MI210	2	2
cn-mem	large memory	1	Intel	Haswell	72	72	3 TB	Nvidia V100 16GB	1	1

¹A small subset of node memory is reserved for the OS and file system and is not available for user applications.

Compute Nodes by Vendor

SeaWulf's compute infrastructure includes nodes from multiple leading vendors:

164 Compute Nodes from Penguin Computing: Dual Intel Xeon Haswell CPUs (28 cores/node, 2.0 GHz), 128 GB DDR4, FDR InfiniBand. Includes 8 GPU nodes with K80, P100, and V100 GPUs.
64 Compute Nodes from Dell: Dual Intel Xeon Gold 6148 Skylake CPUs (40 cores/node, 2.4 GHz), 192 GB RAM, FDR InfiniBand.
48 Compute Nodes from HPE: Dual AMD EPYC 7643 Milan CPUs (96 cores/node, 3.2 GHz), 256 GB RAM, HDR100 InfiniBand.
11 GPU Compute Nodes from Dell: Dual Intel Xeon 6338 Ice Lake CPUs (64 cores/node, 2.0 GHz) with 4× Nvidia A100 80GB GPUs per node, 256 GB RAM, HDR100 InfiniBand.
94 Compute Nodes from HPE: Dual Intel Xeon Max 9468 Sapphire Rapids CPUs (96 cores/node, 2.6 GHz) with either 384 GB (256 GB DDR5 + 128 GB HBM) or 1 TB DDR5 + 128 GB HBM cache configurations, NDR InfiniBand.

Memory Architecture

SeaWulf offers different memory configurations optimized for various computational needs:

Standard Memory (DDR4/DDR5)

Capacity: 128 GB to 256 GB per node (standard compute nodes)
Characteristics: Balanced performance suitable for most applications
Use when: Your application has typical memory requirements and doesn't need specialized memory features

High-Bandwidth Memory (HBM)

Capacity: 128 GB HBM + 256 GB DDR5 per node (384 GB total)
Characteristics: 2-4× higher memory bandwidth compared to standard DDR5
Use when: Your application is limited by memory bandwidth rather than compute capacity
Learn more: See HBM Nodes Guide for detailed information

Ultra-Large Memory Configuration

Capacity: 1 TB DDR5 + 128 GB HBM configured as level 4 cache (4 nodes available)
Characteristics: Combines massive capacity with high-bandwidth HBM cache layer
Use when: You need both extremely large memory capacity and high bandwidth
Access: Available via the hbm-1tb-long-96core queue

GPU Resources

SeaWulf provides GPU acceleration across multiple generations of NVIDIA hardware:

GPU Generations Available

GPU Model	Memory	Platform	Typical Use Cases
NVIDIA K80	24 GB	Legacy (login1/2)	Basic GPU computing, older CUDA applications
NVIDIA P100	16 GB	Legacy (login1/2)	Scientific computing, double-precision workloads
NVIDIA V100	32 GB	Legacy (login1/2)	Deep learning, HPC applications
NVIDIA A100	80 GB	Modern (milan1/2)	Large-scale AI/ML, demanding HPC workloads

GPU Node Configuration

K80 nodes: 4 GPUs per node (6 nodes available)
P100 nodes: 2 GPUs per node (1 node available)
V100 nodes: 2 GPUs per node (2 nodes available)
A100 nodes: 4 GPUs per node (11 nodes available)

Network and Storage Infrastructure

High-Speed InfiniBand® Networking

SeaWulf employs multiple generations of InfiniBand® networking from Nvidia®:

Login Nodes: Five login nodes for user access and job submission
Network Types:
- FDR InfiniBand (Haswell and Skylake nodes)
- HDR100 InfiniBand (Milan and A100 nodes)
- NDR InfiniBand (Sapphire Rapids nodes)
Transfer Speeds: 5-50 GB/s depending on network generation

The InfiniBand network enables low-latency, high-bandwidth communication between nodes, essential for large-scale parallel computing applications.

Storage System

SeaWulf's storage infrastructure is built on IBM's GPFS (General Parallel File System) solution:

Storage Capacity

Total Capacity: Approximately 4 petabytes (PB) of SAS spinning disk
High-Performance Tier: 50 terabytes (TB) of SSD storage

Performance Metrics

Random Read Performance: Over 4 million 4K IOPS sustained
Sequential Read Performance: Exceeding 36 GB/s

This tiered storage architecture provides both high capacity for large datasets and exceptional performance for I/O-intensive applications, with consistent file access across all compute and login nodes.

Performance Calculations

Technical formulas for calculating system performance metrics:

Peak Computational Performance

Formula: Cores per node × Number of nodes × Base clock per core × Double precision FLOPS per cycle + Nvidia GPU Teraflops

This calculation yields SeaWulf's theoretical peak performance of approximately 1.86 PFLOP/s.

Memory Bandwidth

Formula: Memory Clock (MHz) × 2 (DDR) × 64-bit Memory Bus width × 4 Memory interfaces per CPU (Quad-channel) × 2 CPUs per node ÷ 8 bits per byte

Example (Haswell nodes): 1,066 MHz × 2 × 64 × 4 × 2 ÷ 8 = 133.25 GB/s per node

SeaWulf Architecture Overview

System Capacity

Hardware Generations and Access

Legacy Platform (login1/login2)

Modern Platform (milan1/milan2)

CPU Architectures

Detailed Node Specifications

Compute Nodes by Vendor

Memory Architecture

Standard Memory (DDR4/DDR5)

High-Bandwidth Memory (HBM)

Ultra-Large Memory Configuration

GPU Resources

GPU Generations Available

GPU Node Configuration

Network and Storage Infrastructure

High-Speed InfiniBand® Networking

Storage System

Storage Capacity

Performance Metrics

Performance Calculations

Peak Computational Performance

Memory Bandwidth

Related Documentation