SeaWulf GPU Nodes

SeaWulf GPU Nodes Guide

SeaWulf provides several GPU-accelerated nodes across Haswell, Skylake, and AMD Milan architectures with NVIDIA A100, P100, or V100 GPUs. These nodes are optimized for GPU-accelerated workloads in areas such as AI, molecular dynamics, and image processing.

Note: Login nodes do not have GPUs. Running nvidia-smi on a login node will produce an error.

Available GPU Queues

Queue CPU Architecture Vector/Matrix Extension CPU Cores per Node GPUs per Node Node Memory Default Runtime Max Runtime Max Nodes Max Jobs per User Multi-User
gpu Intel Haswell AVX2 28 4 128 GB 1 hr 8 hrs 2 2 No
gpu-long Intel Haswell AVX2 28 4 128 GB 8 hrs 48 hrs 1 2 No
gpu-large Intel Haswell AVX2 28 4 128 GB 1 hr 8 hrs 4 1 No
p100 Intel Haswell AVX2 12 2 64 GB 1 hr 24 hrs 1 1 No
v100 Intel Haswell AVX2 28 2 128 GB 1 hr 24 hrs 1 1 No
a100 AMD Milan AVX2 96 4 256 GB 1 hr 8 hrs 2 2 Yes
a100-long AMD Milan AVX2 96 4 256 GB 8 hrs 48 hrs 1 2 Yes
a100-large AMD Milan AVX2 96 4 256 GB 1 hr 8 hrs 4 1 Yes

Accessing GPU Nodes

Submit GPU jobs using the SLURM workload manager. Load the slurm module before submitting:

module load slurm
sbatch job_script.sh

Example interactive session:

srun -J myjob -N 1 -p a100 --gpus=1 --pty bash

Example batch script:

#!/bin/bash
#SBATCH --job-name=gpu_test
#SBATCH --output=res.txt
#SBATCH -p a100
#SBATCH --gpus=1
#SBATCH --time=02:00:00

module load cuda120/toolkit/12.0
nvcc mycode.cu -o mycode
./mycode

Using CUDA for GPU Acceleration

To compile and run GPU-accelerated code, load the appropriate CUDA toolkit:

# For K80, P100, and V100 nodes
module load cuda113/toolkit/11.3

# For A100 nodes
module load cuda120/toolkit/12.0

Compile with nvcc:

nvcc input.cu -o output

Sample CUDA program available at: /gpfs/projects/samples/cuda/test.cu

Monitoring GPU Usage

Monitor GPU performance during jobs to ensure efficient utilization:

nvidia-smi
module load nvtop
nvtop
  • nvidia-smi: Displays GPU utilization, memory usage, and active processes.
  • nvtop: Interactive, real-time GPU monitoring tool similar to htop.

Best Practices

  • Always request GPUs explicitly using #SBATCH --gpus=[number].
  • Request memory with #SBATCH --mem=[amount] to stay within node limits.
  • Avoid running compute workloads on login nodes.
  • Monitor usage regularly and release resources promptly after jobs finish.

Note on NVwulf Access

If your workloads require additional GPU capacity or dedicated access, you may request access to the NVwulf cluster through the HPC portal. NVwulf provides additional GPU nodes for high-demand or long-running jobs.

Getting Access to NVwulf