SeaWulf Quick Start Guide

SeaWulf HPC Quick Start Guide

SeaWulf is a high-performance computing cluster with advanced components from AMD, Dell, HPE, and others. It's available for the campus community, Brookhaven National Laboratory staff, and New York State companies.

This guide will get you started with your first job submission in about 30 minutes.

Prerequisites and Account Setup

Request Access

Access to SeaWulf is controlled through a project-based system:

  • Projects can be created only by Stony Brook faculty, Brookhaven Lab staff, or New York State companies.
  • Accounts are granted under an existing project and linked to its project number.

Need help requesting an account or project? See our Account/Project Request Documentation

Connecting to the Cluster

SeaWulf uses a Linux command line environment. You have two connection options:

Option 1: Open OnDemand (Recommended for beginners)

A web-based interface with GUI support - great for file management and interactive applications.

Open OnDemand Overview

Option 2: SSH Connection

Connect directly from your terminal:

# For general computing (Rocky Linux 9.6)
ssh -X SBU_NETID@milan.seawulf.stonybrook.edu

# For legacy applications (CentOS 7)
ssh -X SBU_NETID@login.seawulf.stonybrook.edu

Replace SBU_NETID with your Stony Brook NetID (e.g., jsmith).

DUO: Both options will require DUO Authentication for you to connect.

Windows users: Consider using MobaXterm for enhanced SSH functionality.

New to Linux? Check our essential Linux commands guide

Understanding the Cluster Architecture

SeaWulf is a heterogeneous computing cluster composed of nodes with different architectures, each optimized for specific workloads.

Login nodes serve as the entry point to the cluster. You connect to them first to manage files, prepare job scripts, and submit jobs to the compute nodes.

login.seawulf.stonybrook.edu (login1/login2)

Accesses older Intel Haswell nodes and GPU nodes used for legacy or specialized applications.

milan.seawulf.stonybrook.edu (milan1/milan2)

Accesses more modern compute nodes, including Intel Skylake (40-core), AMD EPYC Milan (96-core), and Intel Sapphire Rapids (96-core with HBM). These nodes support most general and high-performance workloads.

Important: Do not run intensive computations on the login nodes. They are shared resources intended for job setup, file management, and lightweight tasks only.

Learn more about node specifications and choosing the right resources

Software and Modules

Software on SeaWulf is managed through environment modules, which help ensure compatibility with each node’s architecture.

# See available modules
module avail

# Load software into your environment
module load slurm
module load gcc/12.3.0

# View currently loaded modules
module list

Why modules matter: Different node types require software builds optimized for their specific hardware. Modules make it easy to load the correct versions and maintain consistent environments across sessions.

Module guide: Using Modules
Software Installation guide: Managing Your Own Software

Your First Job: Hello World Example

Let's walk through submitting your first job step by step.

Step 1: Prepare Your Workspace

# Create a working directory
mkdir ~/my_first_job
cd ~/my_first_job

# Load the job scheduler
module load slurm

Step 2: Create a Simple Job Script

Create a file called hello.slurm:

#!/bin/bash
#SBATCH --job-name=hello_world
#SBATCH --partition=short-40core-shared
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --mem=1G
#SBATCH --time=00:05:00
#SBATCH --output=hello_%j.out
#SBATCH --error=hello_%j.err

# Print system information
echo "Hello from SeaWulf!"
echo "Job ID: $SLURM_JOB_ID"
echo "Running on node: $(hostname)"
echo "Date and time: $(date)"

# Simple computation
echo "Computing 2+2..."
echo "Result: $((2+2))"

Step 3: Submit Your Job

sbatch hello.slurm

You'll see output like: Submitted batch job 12345

Step 4: Monitor Your Job

# Check job status
squeue -u $USER

# Once complete, view the output
cat hello_12345.out

Job Types and When to Use Them

Batch Jobs (Most Common)

  • Best for: Long computations and automated workflows
  • How: Submit script with sbatch filename.slurm
  • Advantage: Runs without your supervision once resources are available

Interactive Jobs

  • Best for: Debugging and less defined workflows
  • How: Request interactive session with srun
srun --partition=short-40core --nodes=1 --ntasks=4 --time=1:00:00 --pty bash

Advanced job configurations and optimizations: SLURM Overview

Choosing Resources and Queues

SeaWulf offers different partitions (queues) optimized for various job types:

The full searchable table can be found here: Queues Table

Monitoring and Managing Jobs

Essential Commands

# View your jobs
squeue -u $USER

# Get detailed job information
scontrol show job JOB_ID

# Cancel a job
scancel JOB_ID

# Check resource usage (while job is running)
/gpfs/software/hpc_tools/get_resource_usage.py

Understanding Job States

State Description
PD (Pending) Waiting for resources
R (Running) Currently executing
CD (Completed) Finished successfully
F (Failed) Encountered an error

Advanced monitoring and optimization: Managing Jobs

Data Management Basics

Storage Locations

  • Home directory (20GB): For personal files, scripts, and configuration settings. Space is limited, so avoid storing large datasets here.
  • Scratch space (20TB): High-capacity temporary storage for active computations and intermediate results. Files in scratch are not backed up and are deleted periodically.
  • Project storage: Shared directories for research groups. Intended for collaborative work and long-term data storage, but files are not backed up.

File Transfer

# Copy files to cluster
scp myfile.txt username@milan.seawulf.stonybrook.edu:~/

# Copy files from cluster
scp username@milan.seawulf.stonybrook.edu:~/results.txt ./

Complete data management guide: File Transfer with rsync, scp, and sftp

Common Issues and Solutions

Job Won't Start?

  • Check queue limits: Verify your job fits partition constraints
  • Resource availability: Try smaller resource requests or different partitions
  • Syntax errors: Validate your SLURM script with sbatch --test-only script.slurm

Performance Issues?

  • Monitor resource usage: Use the monitoring tools to check if you're using allocated resources
  • Right-size requests: Don't request more cores/memory than your application can use
  • Profile your code: Identify bottlenecks before scaling up

Connection Problems?

  • IP Blocked: Repeated connection attempts can lead to your IP address being blocked. Submit a ticket to HPC support to have it unblocked.
  • DUO Blocked: Repeated missed authentication requests can lead to your DUO app becoming blocked. This resolves on its own within a few hours, but can be expedited by submitting a ticket to DoIT. 
  • Key authentication: Consider setting up Passwordless SSH.

GUI Application Won't Launch?

  • X11 forwarding: Add -X flag to SSH for GUI applications. For more information see X11 Forwarding.

Getting Help

For technical issues:

  • Submit a ticket through the IACS ticketing system
  • Include your job ID, error messages, and what you were trying to accomplish

This guide gets you started quickly. For production workloads, please review our detailed documentation to optimize performance and resource usage.