Getting started on NVwulf

This article is intended to help SeaWulf users who want to get started using the NVwulf GPU cluster. For those who are new to high performance computing (HPC) at Stony Brook, please see our other Getting Started articles first.

 

This KB Article References: High Performance Computing
This Information is Intended for: Instructors, Researchers, Staff, Students
Created: 07/02/25 Last Updated: 07/02/25
 
 

Logging into NVwulf via the command line

Assuming that an NVwulf account has already been requested and approved, you may access the NVwulf login node using the respective command-line from any modern workstation via secure shell (SSH). On a Linux or Mac machine, simply open your favorite terminal program and ssh to the NVwulf login node with X11 enabled by issuing the following (NetID should be in all lowercase):

ssh -X NetID@login.nvwulf.stonybrook.edu

When prompted, input the password associated with your NetID.

For additional information on accessing NVwulf from a Windows machine, information about DUO 2-factor authentication, and use of the Global Protect VPN, please see the article on logging into SeaWulf.

 

Logging into NVwulf through Open OnDemand

Additionally, you can access NvWulf through Open OnDemand (OOD), a web-based interface for interacting with the cluster.  To access Open OnDemand for NVwulf, open your internet browser and  input the following URL:

https://nvwulf-ood.nvwulf.stonybrook.edu/

When prompted, input your NetID (all lowercase) and associated password into the Username and Password boxes.

For more details on how to use Open OnDemand, please refer to our main OOD article.

 

Transferring data from SeaWulf to NVwulf

Users may wish to transfer data from SeaWulf to use it on NVwulf.  While there are a variety of tools that can be used to accomplish this, we recommend using rsync.  For example, to copy a file called "test.txt" from your home directory on SeaWulf to your home directory on NVwulf, you can do the following:

 

rsync -v NetID@login.seawulf.stonybrook.edu:~/test.txt ~/test.txt

(Note that the above command is assumed to be run from NVwulf.)

Likewise, to recursively copy a directory called "test/" and all its sub-directories from SeaWulf to NVwulf, you could run:

rsync -av decarlson@login.seawulf.stonybrook.edu:~/test/ ~/test/

Please see our main article on transferring data to SeaWulf for more information on other transfer options and import caveats regarding DUO 2-factor authentication.

 

Accessing modules on NVwulf

To see what modules are available on NVwulf, please use the following command:
 

module avail

This will print a list of available modules to the screen using the format "program/version", where the "program" is the program being loaded and "version" is the specific version. 

In some cases, the compiler and MPI versions that were used to build an application are included in the module name.  For example, the "fftw/nvhpc25.3/3.3.10" module name indicates that the application is FFTW version 3.3.10 that was compiled with the Nvida NVHPC compilers version 25.3.

To access a module, please run the "module load" command.  For example, to access the conda package manager, python programming language and common packages using the miniconda/3 module, please do the following:

module load miniconda/3

$ python
Python 3.13.2 | packaged by Anaconda, Inc. | (main, Feb  6 2025, 18:56:02) [GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> 

If your workload requires an application that we currently do not have available, feel free to submit a ticket. Be sure to clearly indicate that your software request is for NVwulf.

 

Submitting jobs to the Slurm workload manager

To submit computational jobs to one of the GPU compute nodes, first load the slurm module:
 

module load slurm

To see a list of the available Slurm partitions and their attributes, please run the "sinfo" command:

[NetID@login1 ~]$ sinfo

PARTITION    AVAIL  TIMELIMIT  NODES  STATE NODELIST

h200x4          up    8:00:00      1    mix h200x4-04

h200x4          up    8:00:00      3   alloc h200x4-[01-03]

h200x4-large    up    8:00:00      1    mix h200x4-04

h200x4-large    up    8:00:00      3   alloc h200x4-[01-03]

h200x4-long     up 2-00:00:00      1    mix h200x4-04

h200x4-long     up 2-00:00:00      3   alloc h200x4-[01-03]

h200x8          up    8:00:00      1   idle h200x8-02

h200x8-large    up    8:00:00      1   idle h200x8-02

h200x8-long     up 2-00:00:00      1   idle h200x8-02

 

The above output indicates that the h200x4 partitions (which provide access to 4-way H200 GPU nodes) have 3 three nodes that are fully allocated (in use) and one node that is in a "mixed" state, with some GPUs allocated and at least one idle. Likewise, the h200x8 partitions (which provide access to 8-way H200 GPU nodes) have one fully idle node.

All NVwulf GPU compute nodes allow "multitenancy," meaning that multiple users can potentially run jobs on the same node at the same time.  Because of this, it is important to be explicit about the resources (GPU, CPUs, host memory) required when you request your job

Here is an example job script that allocates 1 GPU, 8 CPUs, and 25 GB of host memory and runs a short tensorflow training example in the h200x4 queue:

#!/bin/bash


#SBATCH --job-name=test-tf
#SBATCH --output=res.txt
#SBATCH --ntasks=8
#SBATCH --cpus-per-task=1
#SBATCH --nodes=1
#SBATCH --time=05:00
#SBATCH -p h200x4
#SBATCH --mem=25g
#SBATCH --gpus=1

module load tensorflow/2.19.0
python  tf_test_nn_training.py

Let's save this job script as test_job.slurm.  To submit it, you would run the following:
 

sbatch  test_job.slurm

Once your job has been submitted you can check the status using the "squeue" command.

To see the status of a particular job (in this case job ID 999):

squeue -j 999

And to see the status of all your current jobs:

squeue -u NetID

For testing and debugging purposes, the same job could be run interactively via the following:

srun --job-name=test-tf --ntasks=8 --cpus-per-task=1 --nodes=1 --gpus=1 -p h200x4 --time=00:05:00 --pty bash

Once the job starts, you will be automatically placed on one of the h200x4 compute nodes and can run commands interactively at the command line.

For GUI-based programs like Jupyter and Code-Server (VS Code), we recommend using the interactive apps available via Open OnDemand.

 

 

Article Topic