NVHPC (NVIDIA HPC SDK) | Research Computing

This KB Article References: High Performance Computing
This Information is Intended for: Instructors, Researchers, Staff, Students
Created: 03/07/2025
Last Updated: 03/07/2025

Introduction to NVIDIA HPC SDK on SeaWulf

The NVIDIA HPC SDK (NVHPC) is a comprehensive suite of compilers, libraries, and tools designed specifically for high-performance computing on NVIDIA GPU-accelerated systems. This toolkit provides optimized solutions for developing applications that can effectively leverage the computational power of NVIDIA GPUs in the SeaWulf computing environment.

NVHPC is built on NVIDIA's industry-leading compiler technology and includes support for C, C++, and Fortran programming languages. It offers specialized optimizations for GPU acceleration, parallel computing, and scientific computing, making it an essential tool for researchers and developers working on computationally intensive applications on GPU nodes in the SeaWulf cluster.

Available NVHPC Versions

SeaWulf currently provides the following NVHPC versions:

nvidia/nvhpc/21.5 - Compatible with CUDA 11.x
nvidia/nvhpc/21.7 - Compatible with CUDA 11.x
nvidia/nvhpc/23.7 - Compatible with CUDA 11.x and 12.x
nvidia/nvhpc/23.11 - Compatible with CUDA 12.x
nvidia/nvhpc/24.11 - Latest version (recommended)

To load a specific NVHPC version, use the module command:

module load nvidia/nvhpc/24.11

Note: There are also specialized versions of NVHPC with different MPI implementations or without MPI (nompi). Choose the appropriate module based on your parallel programming needs.

NVHPC Variants

In addition to the standard NVHPC modules, SeaWulf offers several specialized variants:

nvidia/nvhpc-nompi - NVHPC without MPI support
nvidia/nvhpc-hpcx - NVHPC with Mellanox HPC-X MPI
nvidia/nvhpc-openmpi3 - NVHPC with OpenMPI 3.x
nvidia/nvhpc-byo-compiler - "Bring Your Own Compiler" variant
nvidia/nvhpc-hpcx-cuda11 - Specific for CUDA 11.x compatibility
nvidia/nvhpc-hpcx-cuda12 - Specific for CUDA 12.x compatibility

Example usage:

module load nvidia/nvhpc-hpcx-cuda12/23.11

Important: Ensure the NVHPC variant you choose is compatible with your CUDA requirements and parallel programming model.

NVHPC Compilers

The NVIDIA HPC SDK provides several compilers optimized for NVIDIA GPUs:

C Compiler: nvc
C++ Compiler: nvc++
Fortran Compiler: nvfortran

Example usage:

nvc myprogram.c -o myprogram # Compile a C program

nvc++ myprogram.cpp -o myprogram # Compile a C++ program

nvfortran myprogram.f90 -o myprogram # Compile a Fortran program

GPU Acceleration with NVHPC

NVHPC provides multiple programming models for GPU acceleration:

OpenACC: A directive-based approach for GPU programming
CUDA: NVIDIA's parallel computing platform
OpenMP: Standard API for parallel programming with GPU target support
Standard Language Features: C++17 parallel algorithms, Fortran DO CONCURRENT

Example OpenACC compilation:

nvc -acc -Minfo=accel myprogram.c -o myprogram

Example CUDA compilation:

nvc -cuda -Minfo=accel mycudaprogram.cu -o mycudaprogram

Note: The -Minfo=accel flag provides detailed information about the accelerator code generated by the compiler.

MPI Integration with NVHPC

For parallel programming with MPI, NVHPC provides compiler wrappers that automatically include the necessary MPI libraries and flags:

MPI Wrappers:

mpicc: for C programs
mpicxx / mpic++: for C++ programs
mpifort: for Fortran programs

Example SLURM script with NVHPC and MPI:

#!/bin/bash
#SBATCH --job-name=nvhpc_test
#SBATCH --output=nvhpc_test.out
#SBATCH -p a100
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=2
#SBATCH --gres=gpu:2
#SBATCH --time=01:00:00

# Load necessary modules
module load nvidia/nvhpc-hpcx-cuda12/23.11

# Compile and run your code
mpicc -acc=gpu -Minfo=accel mpi_gpu_example.c -o mpi_gpu_example
mpirun -np $SLURM_NTASKS ./mpi_gpu_example

Optimization Tips for NVHPC

To maximize performance with NVHPC on GPU nodes in SeaWulf, consider the following optimization strategies:

Basic Optimization Flags:
# High optimization level
nvc -O3 myprogram.c -o myprogram
GPU Targeting:
# Target specific GPU architecture
nvc -acc=gpu -gpu=cc80 myprogram.c -o myprogram

Note: Common compute capability values include cc70 (Volta), cc75 (Turing), cc80 (Ampere), and cc90 (Hopper). Check your GPU architecture and use the appropriate value.
Memory Usage Optimization:
# Managed memory model for simplicity
nvc -acc=gpu -gpu=managed myprogram.c -o myprogram
Profiling Support:
# Enable profiling with NVIDIA tools
nvc -acc=gpu -gpu=lineinfo myprogram.c -o myprogram
Math Library Optimization:
# Link with NVIDIA math libraries
nvc -acc=gpu myprogram.c -o myprogram -lcublas -lcufft

NVHPC Tools

The NVIDIA HPC SDK includes several tools to help optimize and debug your GPU-accelerated applications:

NVIDIA Nsight Systems: System-wide performance analysis tool
NVIDIA Nsight Compute: Interactive kernel profiler
NVIDIA Debugger (cuda-gdb): For debugging GPU applications
NVTOP: Interactive GPU process monitor (available as separate modules: nvtop/3.0.1 and nvtop/3.1.0)

Example usage:

nsys profile ./myprogram # Profile application with Nsight Systems

module load nvtop/3.1.0
nvtop # Monitor GPU usage interactively

NVHPC Version Comparison

Feature	NVHPC 21.x	NVHPC 23.x	NVHPC 24.11
Primary CUDA Support	CUDA 11.x	CUDA 11.x, 12.x	CUDA 12.x
GPU Architecture Support	Up to Ampere	Up to Hopper	Up to Hopper
C++ Standard Support	C++17	C++17/20	C++20
Fortran Standard Support	Fortran 2003/2008	Fortran 2008/2018	Fortran 2018
Recommended for	Legacy code	General use	New projects, best performance

Resources and Documentation

For detailed information on NVIDIA HPC SDK features, optimization techniques, and programming guides, refer to the following resources:

For SeaWulf-specific questions and support with NVIDIA HPC SDK, please contact the SeaWulf support team.