HPC Glossary

HPC and SeaWulf Glossary

A reference guide for common terms encountered when using the SeaWulf cluster.

Core HPC Concepts

Cluster

A collection of interconnected computers that work together as a single system.

High-Performance Computing (HPC)

Computing that aggregates processing power to solve problems requiring significant computational resources.

Node

An individual computer within a cluster. Each node contains processors, memory, and storage.

Login Nodes

Nodes providing user access to the cluster. Used for job submission and file management, not intensive computation.

Compute Nodes

Nodes where computational jobs execute. Accessed through SLURM, not directly.

Core

An individual processing unit within a CPU. SeaWulf nodes range from 28 to 96 cores.

Parallel Processing

Distributing tasks across multiple computers to work simultaneously on different portions of a problem.

petaFLOPS

A unit of computational speed equal to one quadrillion floating-point operations per second. SeaWulf achieves 1.86 petaFLOPS peak performance.

Hardware Components

CPU (Central Processing Unit)

The processor that executes instructions and performs calculations.

GPU (Graphics Processing Unit)

Specialized hardware for parallel processing, used in machine learning and scientific simulations.

High-Bandwidth Memory (HBM)

Advanced memory technology providing faster data transfer than traditional RAM. Available on select SeaWulf nodes.

InfiniBand

High-performance networking technology that interconnects nodes in the cluster.

Interconnect

The network infrastructure connecting cluster nodes for communication and data transfer.

Storage and File Systems

GPFS (General Parallel File System)

IBM's shared-disk file system used for SeaWulf's storage. Provides concurrent file access across all nodes.

Parallel File System

A distributed storage system allowing multiple nodes to access the same files simultaneously.

Scratch Space

High-performance temporary storage for job operations. Provides 20 TB of space and automatically removes files older than 30 days.

Job Scheduling and Management

SLURM (Simple Linux Utility for Resource Management)

The cluster management and job scheduling system used on SeaWulf.

Job

A computational task submitted to the cluster. Jobs are queued until resources become available.

Queue (Partition)

A logical grouping of nodes with similar characteristics. Different queues have different time limits and priorities.

Job Script

A file containing resource requirements and commands to be executed. Submitted using sbatch.

Scheduler

Software that manages job submission and execution by allocating resources based on priority and availability.

Resource Allocation

Granted access to cluster resources including compute time and storage quotas.

Performance and Optimization

Throughput

The rate at which computational work is processed by the system.

Scalability

The ability to maintain or improve performance as resources are added.

Load Balancing

Distribution of computational work across nodes to optimize resource utilization.

Benchmarking

Testing and measuring system performance using standardized metrics.

Fault Tolerance

The system's ability to continue operation despite component failures.

Important Note: This glossary provides quick definitions for common terms. For detailed usage instructions and current system specifications, consult the official SeaWulf documentation or contact the HPC support team.