Core HPC Concepts
Cluster
A collection of interconnected computers that work together as a single system.
High-Performance Computing (HPC)
Computing that aggregates processing power to solve problems requiring significant computational resources.
Node
An individual computer within a cluster. Each node contains processors, memory, and storage.
Login Nodes
Nodes providing user access to the cluster. Used for job submission and file management, not intensive computation.
Compute Nodes
Nodes where computational jobs execute. Accessed through SLURM, not directly.
Core
An individual processing unit within a CPU. SeaWulf nodes range from 28 to 96 cores.
Parallel Processing
Distributing tasks across multiple computers to work simultaneously on different portions of a problem.
petaFLOPS
A unit of computational speed equal to one quadrillion floating-point operations per second. SeaWulf achieves 1.86 petaFLOPS peak performance.
Hardware Components
CPU (Central Processing Unit)
The processor that executes instructions and performs calculations.
GPU (Graphics Processing Unit)
Specialized hardware for parallel processing, used in machine learning and scientific simulations.
High-Bandwidth Memory (HBM)
Advanced memory technology providing faster data transfer than traditional RAM. Available on select SeaWulf nodes.
InfiniBand
High-performance networking technology that interconnects nodes in the cluster.
Interconnect
The network infrastructure connecting cluster nodes for communication and data transfer.
Storage and File Systems
GPFS (General Parallel File System)
IBM's shared-disk file system used for SeaWulf's storage. Provides concurrent file access across all nodes.
Parallel File System
A distributed storage system allowing multiple nodes to access the same files simultaneously.
Scratch Space
High-performance temporary storage for job operations. Provides 20 TB of space and automatically removes files older than 30 days.
Job Scheduling and Management
SLURM (Simple Linux Utility for Resource Management)
The cluster management and job scheduling system used on SeaWulf.
Job
A computational task submitted to the cluster. Jobs are queued until resources become available.
Queue (Partition)
A logical grouping of nodes with similar characteristics. Different queues have different time limits and priorities.
Job Script
A file containing resource requirements and commands to be executed. Submitted using sbatch.
Scheduler
Software that manages job submission and execution by allocating resources based on priority and availability.
Resource Allocation
Granted access to cluster resources including compute time and storage quotas.
Performance and Optimization
Throughput
The rate at which computational work is processed by the system.
Scalability
The ability to maintain or improve performance as resources are added.
Load Balancing
Distribution of computational work across nodes to optimize resource utilization.
Benchmarking
Testing and measuring system performance using standardized metrics.
Fault Tolerance
The system's ability to continue operation despite component failures.

