This article is intended to help SeaWulf users who want to get started using the NVwulf GPU cluster. For those who are new to high performance computing (HPC) at Stony Brook, please see our other Getting Started articles first.
Logging into NVwulf via the command line
Assuming that an NVwulf account has already been requested and approved, you may access the NVwulf login node using the respective command-line from any modern workstation via secure shell (SSH). On a Linux or Mac machine, simply open your favorite terminal program and ssh to the NVwulf login node with X11 enabled by issuing the following (NetID should be in all lowercase):
ssh -X NetID@login.nvwulf.stonybrook.edu
When prompted, input the password associated with your NetID.
For additional information on accessing NVwulf from a Windows machine, information about DUO 2-factor authentication, and use of the Global Protect VPN, please see the article on logging into SeaWulf.
Logging into NVwulf through Open OnDemand
Additionally, you can access NvWulf through Open OnDemand (OOD), a web-based interface for interacting with the cluster. To access Open OnDemand for NVwulf, open your internet browser and input the following URL:
https://nvwulf-ood.nvwulf.stonybrook.edu/
When prompted, input your NetID (all lowercase) and associated password into the Username and Password boxes.
For more details on how to use Open OnDemand, please refer to our main OOD article.
Transferring data from SeaWulf to NVwulf
Users may wish to transfer data from SeaWulf to use it on NVwulf. While there are a variety of tools that can be used to accomplish this, we recommend using rsync. For example, to copy a file called "test.txt" from your home directory on SeaWulf to your home directory on NVwulf, you can do the following:
rsync -v NetID@login.seawulf.stonybrook.edu:~/test.txt ~/test.txt
(Note that the above command is assumed to be run from NVwulf.)
Likewise, to recursively copy a directory called "test/" and all its sub-directories from SeaWulf to NVwulf, you could run:
rsync -av decarlson@login.seawulf.stonybrook.edu:~/test/ ~/test/
Please see our main article on transferring data to SeaWulf for more information on other transfer options and import caveats regarding DUO 2-factor authentication.
Accessing modules on NVwulf
To see what modules are available on NVwulf, please use the following command:
module avail
This will print a list of available modules to the screen using the format "program/version", where the "program" is the program being loaded and "version" is the specific version.
In some cases, the compiler and MPI versions that were used to build an application are included in the module name. For example, the "fftw/nvhpc25.3/3.3.10" module name indicates that the application is FFTW version 3.3.10 that was compiled with the Nvida NVHPC compilers version 25.3.
To access a module, please run the "module load" command. For example, to access the conda package manager, python programming language and common packages using the miniconda/3 module, please do the following:
module load miniconda/3 $ python Python 3.13.2 | packaged by Anaconda, Inc. | (main, Feb 6 2025, 18:56:02) [GCC 11.2.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>>
If your workload requires an application that we currently do not have available, feel free to submit a ticket. Be sure to clearly indicate that your software request is for NVwulf.
Submitting jobs to the Slurm workload manager
To submit computational jobs to one of the GPU compute nodes, first load the slurm module:
module load slurm
To see a list of the available Slurm partitions and their attributes, please run the "sinfo" command:
[NetID@login1 ~]$ sinfo PARTITION AVAIL TIMELIMIT NODES STATE NODELIST h200x4 up 8:00:00 1 mix h200x4-04 h200x4 up 8:00:00 3 alloc h200x4-[01-03] h200x4-large up 8:00:00 1 mix h200x4-04 h200x4-large up 8:00:00 3 alloc h200x4-[01-03] h200x4-long up 2-00:00:00 1 mix h200x4-04 h200x4-long up 2-00:00:00 3 alloc h200x4-[01-03] h200x8 up 8:00:00 1 idle h200x8-02 h200x8-large up 8:00:00 1 idle h200x8-02 h200x8-long up 2-00:00:00 1 idle h200x8-02
The above output indicates that the h200x4 partitions (which provide access to 4-way H200 GPU nodes) have 3 three nodes that are fully allocated (in use) and one node that is in a "mixed" state, with some GPUs allocated and at least one idle. Likewise, the h200x8 partitions (which provide access to 8-way H200 GPU nodes) have one fully idle node.
All NVwulf GPU compute nodes allow "multitenancy," meaning that multiple users can potentially run jobs on the same node at the same time. Because of this, it is important to be explicit about the resources (GPU, CPUs, host memory) required when you request your job.
Here is an example job script that allocates 1 GPU, 8 CPUs, and 25 GB of host memory and runs a short tensorflow training example in the h200x4 queue:
#!/bin/bash #SBATCH --job-name=test-tf #SBATCH --output=res.txt #SBATCH --ntasks=8 #SBATCH --cpus-per-task=1 #SBATCH --nodes=1 #SBATCH --time=05:00 #SBATCH -p h200x4 #SBATCH --mem=25g #SBATCH --gpus=1 module load tensorflow/2.19.0 python tf_test_nn_training.py
Let's save this job script as test_job.slurm. To submit it, you would run the following:
sbatch test_job.slurm
Once your job has been submitted you can check the status using the "squeue" command.
To see the status of a particular job (in this case job ID 999):
squeue -j 999
And to see the status of all your current jobs:
squeue -u NetID
For testing and debugging purposes, the same job could be run interactively via the following:
srun --job-name=test-tf --ntasks=8 --cpus-per-task=1 --nodes=1 --gpus=1 -p h200x4 --time=00:05:00 --pty bash
Once the job starts, you will be automatically placed on one of the h200x4 compute nodes and can run commands interactively at the command line.
For GUI-based programs like Jupyter and Code-Server (VS Code), we recommend using the interactive apps available via Open OnDemand.