SeaWulf Quick Start Guide

SeaWulf HPC Quick Start Guide

SeaWulf is a high-performance computing cluster with advanced components from AMD, Dell, HPE, and others, located at the Stony Brook University Computing Center. It's available for the campus community and Brookhaven Lab staff.

This guide will get you started with your first job submission in about 30 minutes.

Prerequisites and Account Setup

Request Access

Access to SeaWulf is controlled through a project-based system:

  • Projects can be created only by Stony Brook faculty, Brookhaven Lab staff, or New York State companies.
  • Accounts are granted under an existing project and linked to its project number.

Need help requesting an account or project? See our Account/Project Request Documentation

Connecting to the Cluster

SeaWulf uses a Linux command line environment. You have two connection options:

Option 1: Open OnDemand (Recommended for beginners)

A web-based interface with GUI support - great for file management and interactive applications.

Open OnDemand Overview

Option 2: SSH Connection

Connect directly from your terminal:

# For general computing (Rocky Linux 9.6)
ssh -X username@milan.seawulf.stonybrook.edu

# For legacy applications (CentOS 7)
ssh -X username@login.seawulf.stonybrook.edu

Windows users: Consider using MobaXterm for enhanced SSH functionality.

New to Linux? Check our essential Linux commands guide

Understanding the Cluster Architecture

SeaWulf is a heterogeneous cluster with different node types optimized for various workloads:

Login Nodes (login1/login2)

  • OS: CentOS 7
  • Use for: File management, job preparation, light compilation
  • CPU: Haswell 28-core processors

Milan Nodes (milan1/milan2)

  • OS: Rocky Linux 9.6
  • Use for: Most computational work
  • Available processors: Skylake 40-core, Milan 96-core, HBM Sapphire 96-core

Important: Never run intensive computations on login nodes - they're shared resources for job preparation only.

Learn more about node specifications and choosing the right resources

Software and Modules

Software is managed through environment modules tailored to each node architecture:

# See available modules
module avail

# Load commonly needed software
module load slurm
module load gcc/12.3.0

# View currently loaded modules
module list

Why modules matter: Different nodes require different software builds optimized for their architecture.

Module guide: Using Modules
Software Installation guide: Managing Your Own Software

Your First Job: Hello World Example

Let's walk through submitting your first job step by step.

Step 1: Prepare Your Workspace

# Create a working directory
mkdir ~/my_first_job
cd ~/my_first_job #

Load the job scheduler
module load slurm

Step 2: Create a Simple Job Script

Create a file called hello.slurm:

#!/bin/bash
#SBATCH --job-name=hello_world
#SBATCH --partition=short-40core
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --time=00:05:00 #SBATCH --output=hello_%j.out
#SBATCH --error=hello_%j.err

# Print system information
echo "Hello from SeaWulf!"
echo "Job ID: $SLURM_JOB_ID"
echo "Running on node: $(hostname)"
echo "Date and time: $(date)"

# Simple computation
echo "Computing 2+2..."
echo "Result: $((2+2))"

Step 3: Submit Your Job

sbatch hello.slurm

You'll see output like: Submitted batch job 12345

Step 4: Monitor Your Job

# Check job status
squeue -u $USER

# Once complete, view the output cat hello_12345.out

Job Types and When to Use Them

Batch Jobs (Most Common)

  • Best for: Production runs, long computations, automated workflows
  • How: Submit script with sbatch filename.slurm
  • Advantage: Runs without your supervision once resources are available

Interactive Jobs

  • Best for: Debugging, exploration, GUI applications
  • How: Request interactive session with srun
srun --partition=short-40core --nodes=1 --ntasks=4 --time=1:00:00 --pty bash

Advanced job configurations and optimization: SLURM Overview

Choosing Resources and Queues

SeaWulf offers different partitions (queues) optimized for various job types:

The full searchable table can be found here: Queues Table

Resource Planning Tips

  • Start small: Request minimal resources for testing
  • Monitor usage: Use seff job_id after jobs complete to see actual resource utilization
  • Scale appropriately: Only request what you actually need

Monitoring and Managing Jobs

Essential Commands

# View your jobs
squeue -u $USER

# Get detailed job information
scontrol show job JOB_ID

# Cancel a job
scancel JOB_ID

# Check resource usage (while job is running)
/gpfs/software/hpc_tools/get_resource_usage.py

Understanding Job States

State Description
PD (Pending) Waiting for resources
R (Running) Currently executing
CD (Completed) Finished successfully
F (Failed) Encountered an error

Advanced monitoring and optimization: Monitoring Jobs

Data Management Basics

Storage Locations

  • Home directory (~): Personal files, limited space
  • Project storage: Shared team storage (path provided with account)
  • Scratch space: Temporary high-performance storage for job data

File Transfer

# Copy files to cluster
scp myfile.txt username@milan.seawulf.stonybrook.edu:~/

# Copy files from cluster
scp username@milan.seawulf.stonybrook.edu:~/results.txt ./

Complete data management guide: File Transfer with rsync, scp, and sftp

Common Issues and Solutions

Job Won't Start?

  • Check queue limits: Verify your job fits partition constraints
  • Resource availability: Try smaller resource requests or different partitions
  • Syntax errors: Validate your SLURM script with sbatch --test-only script.slurm

Performance Issues?

  • Monitor resource usage: Use the monitoring tools to check if you're using allocated resources
  • Right-size requests: Don't request more cores/memory than your application can use
  • Profile your code: Identify bottlenecks before scaling up

Connection Problems?

  • IP Blocked: Repeated connection attempts can lead to your IP address being blocked. Submit a ticket to HPC support to have it unblocked.
  • DUO Blocked: Repeated missed authentication requests can lead to your DUO app becoming blocked. This resolves on its own within a few hours, but can be expedited by submitting a ticket to DoIT.
  • Key authentication: Consider setting up SSH keys for easier access
  • X11 forwarding: Add -X flag to SSH for GUI applications

Getting Help

For technical issues:

  • Submit a ticket through the IACS ticketing system
  • Include your job ID, error messages, and what you were trying to accomplish

This guide gets you started quickly. For production workloads, please review our detailed documentation to optimize performance and resource usage.