SeaWulf is a high-performance computing cluster with advanced components from AMD, Dell, HPE, and others, located at the Stony Brook University Computing Center. It's available for the campus community and Brookhaven Lab staff.
This guide will get you started with your first job submission in about 30 minutes.
Prerequisites and Account Setup
Request Access
Access to SeaWulf is controlled through a project-based system:
- Projects can be created only by Stony Brook faculty, Brookhaven Lab staff, or New York State companies.
- Accounts are granted under an existing project and linked to its project number.
Need help requesting an account or project? See our Account/Project Request Documentation
Connecting to the Cluster
SeaWulf uses a Linux command line environment. You have two connection options:
Option 1: Open OnDemand (Recommended for beginners)
A web-based interface with GUI support - great for file management and interactive applications.
Option 2: SSH Connection
Connect directly from your terminal:
ssh -X username@milan.seawulf.stonybrook.edu
# For legacy applications (CentOS 7)
ssh -X username@login.seawulf.stonybrook.edu
Windows users: Consider using MobaXterm for enhanced SSH functionality.
New to Linux? Check our essential Linux commands guide
Understanding the Cluster Architecture
SeaWulf is a heterogeneous cluster with different node types optimized for various workloads:
Login Nodes (login1/login2)
- OS: CentOS 7
- Use for: File management, job preparation, light compilation
- CPU: Haswell 28-core processors
Milan Nodes (milan1/milan2)
- OS: Rocky Linux 9.6
- Use for: Most computational work
- Available processors: Skylake 40-core, Milan 96-core, HBM Sapphire 96-core
Important: Never run intensive computations on login nodes - they're shared resources for job preparation only.
Learn more about node specifications and choosing the right resources
Software and Modules
Software is managed through environment modules tailored to each node architecture:
module avail
# Load commonly needed software
module load slurm
module load gcc/12.3.0
# View currently loaded modules
module list
Why modules matter: Different nodes require different software builds optimized for their architecture.
Module guide: Using Modules
Software Installation guide: Managing Your Own Software
Your First Job: Hello World Example
Let's walk through submitting your first job step by step.
Step 1: Prepare Your Workspace
mkdir ~/my_first_job
cd ~/my_first_job #
Load the job scheduler
module load slurm
Step 2: Create a Simple Job Script
Create a file called hello.slurm
:
#SBATCH --job-name=hello_world
#SBATCH --partition=short-40core
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --time=00:05:00 #SBATCH --output=hello_%j.out
#SBATCH --error=hello_%j.err
# Print system information
echo "Hello from SeaWulf!"
echo "Job ID: $SLURM_JOB_ID"
echo "Running on node: $(hostname)"
echo "Date and time: $(date)"
# Simple computation
echo "Computing 2+2..."
echo "Result: $((2+2))"
Step 3: Submit Your Job
You'll see output like: Submitted batch job 12345
Step 4: Monitor Your Job
squeue -u $USER
# Once complete, view the output cat hello_12345.out
Job Types and When to Use Them
Batch Jobs (Most Common)
- Best for: Production runs, long computations, automated workflows
- How: Submit script with
sbatch filename.slurm
- Advantage: Runs without your supervision once resources are available
Interactive Jobs
- Best for: Debugging, exploration, GUI applications
- How: Request interactive session with
srun
Advanced job configurations and optimization: SLURM Overview
Choosing Resources and Queues
SeaWulf offers different partitions (queues) optimized for various job types:
The full searchable table can be found here: Queues Table
Resource Planning Tips
- Start small: Request minimal resources for testing
- Monitor usage: Use
seff job_id
after jobs complete to see actual resource utilization - Scale appropriately: Only request what you actually need
Monitoring and Managing Jobs
Essential Commands
squeue -u $USER
# Get detailed job information
scontrol show job JOB_ID
# Cancel a job
scancel JOB_ID
# Check resource usage (while job is running)
/gpfs/software/hpc_tools/get_resource_usage.py
Understanding Job States
State | Description |
---|---|
PD (Pending) | Waiting for resources |
R (Running) | Currently executing |
CD (Completed) | Finished successfully |
F (Failed) | Encountered an error |
Advanced monitoring and optimization: Monitoring Jobs
Data Management Basics
Storage Locations
- Home directory (
~
): Personal files, limited space - Project storage: Shared team storage (path provided with account)
- Scratch space: Temporary high-performance storage for job data
File Transfer
scp myfile.txt username@milan.seawulf.stonybrook.edu:~/
# Copy files from cluster
scp username@milan.seawulf.stonybrook.edu:~/results.txt ./
Complete data management guide: File Transfer with rsync, scp, and sftp
Common Issues and Solutions
Job Won't Start?
- Check queue limits: Verify your job fits partition constraints
- Resource availability: Try smaller resource requests or different partitions
- Syntax errors: Validate your SLURM script with
sbatch --test-only script.slurm
Performance Issues?
- Monitor resource usage: Use the monitoring tools to check if you're using allocated resources
- Right-size requests: Don't request more cores/memory than your application can use
- Profile your code: Identify bottlenecks before scaling up
Connection Problems?
- IP Blocked: Repeated connection attempts can lead to your IP address being blocked. Submit a ticket to HPC support to have it unblocked.
- DUO Blocked: Repeated missed authentication requests can lead to your DUO app becoming blocked. This resolves on its own within a few hours, but can be expedited by submitting a ticket to DoIT.
- Key authentication: Consider setting up SSH keys for easier access
- X11 forwarding: Add
-X
flag to SSH for GUI applications
Getting Help
For technical issues:
- Submit a ticket through the IACS ticketing system
- Include your job ID, error messages, and what you were trying to accomplish
This guide gets you started quickly. For production workloads, please review our detailed documentation to optimize performance and resource usage.