Essential commands and workflows for working with SeaWulf and HPC systems.
Quick Jump: Common Commands | Job Submission | Job Management | File Transfer | Modules | Storage | Monitoring | Troubleshooting
Most Common Commands
| Command | Purpose | 
|---|---|
| sbatch job.sh | Submit a job script | 
| squeue -u $USER | Check your jobs | 
| scancel JOBID | Cancel a job | 
| sinfo | Check node/partition status | 
| module load SOFTWARE | Load software module | 
| scontrol show job JOBID | View job details | 
Job Submission Methods
sbatch (Batch Jobs)
Submit a script to run in background. Best for production jobs.
sbatch job_script.sh
srun (Direct Execution)
Run a command immediately with specified resources. Blocks until complete.
srun --partition=short-40core --time=01:00:00 --ntasks=1 my_program
salloc (Interactive Session)
Allocate resources for interactive work. Use with srun for commands.
salloc --partition=short-40core --time=02:00:00 --ntasks=4
Job Script Template
Standard template with common directives. %j expands to job ID.
#!/bin/bash
#SBATCH --job-name=my_job
#SBATCH --output=results_%j.txt
#SBATCH --error=errors_%j.txt
#SBATCH --time=02:00:00
#SBATCH --partition=short-40core
#SBATCH --ntasks=40
#SBATCH --cpus-per-task=1
#SBATCH --mail-type=END,FAIL
#SBATCH --mail-user=your.email@stonybrook.edu
module load intel-stack
mpirun ./executable
#SBATCH --job-name=my_job
#SBATCH --output=results_%j.txt
#SBATCH --error=errors_%j.txt
#SBATCH --time=02:00:00
#SBATCH --partition=short-40core
#SBATCH --ntasks=40
#SBATCH --cpus-per-task=1
#SBATCH --mail-type=END,FAIL
#SBATCH --mail-user=your.email@stonybrook.edu
module load intel-stack
mpirun ./executable
Partition Quick Reference
| Partition | Use Case | Time Limit | 
|---|---|---|
| short-40core | Testing, quick jobs | Short | 
| long-40core | Extended computations | Long | 
| gpu | GPU-accelerated work | Varies | 
| hbm | Memory-intensive jobs | Varies | 
| shared | Partial node use | Varies | 
Job Management
| Command | Function | Example | 
|---|---|---|
| squeue | View job queue | squeue -u $USER | 
| scancel | Cancel jobs | scancel 123456 | 
| scontrol show job | Job details | scontrol show job 123456 | 
| sacct | Job history | sacct -j 123456 | 
| sstat | Running job stats | sstat -j 123456 | 
Useful Queue Commands
Custom formatted view of your jobs
squeue -u $USER --format="%.10i %.12j %.8T %.10M %.6D %R"
Cancel all pending jobs
scancel -u $USER --state=pending
Job history since date
sacct --starttime=2024-01-01 --format=JobID,JobName,State,ExitCode
System Information
| Command | Information | Example | 
|---|---|---|
| sinfo | Partition/node status | sinfo -p short-40core | 
| scontrol show nodes | Node details | scontrol show nodes compute-1-1 | 
| scontrol show partition | Partition details | scontrol show partition short-40core | 
| sshare | Fair share info | sshare -u $USER | 
Node Status
List all nodes with details
sinfo -N -l
Custom format: nodes, cores, memory, features, GPUs, state
sinfo --format="%.15N %.10c %.10m %.25f %.10G %.6t"
File Transfer
SCP (Simple Copy)
Upload file to home
scp local_file.txt username@milan.seawulf.stonybrook.edu:~/
Upload directory
scp -r local_directory/ username@milan.seawulf.stonybrook.edu:~/
Download file
scp username@milan.seawulf.stonybrook.edu:~/remote_file.txt ./
Rsync (Recommended for Large Transfers)
Sync directories with compression
rsync -avz local_directory/ username@milan.seawulf.stonybrook.edu:~/remote_directory/
Download with progress
rsync -avz --progress username@milan.seawulf.stonybrook.edu:~/data/ ./local_data/
Rsync flags: -a (archive), -v (verbose), -z (compress), --progress (show progress), --dry-run (test first)
Environment Modules
| Command | Purpose | Example | 
|---|---|---|
| module avail | List available modules | module avail python | 
| module load | Load module | module load python/3.9 | 
| module unload | Unload module | module unload python | 
| module list | Show loaded modules | module list | 
| module purge | Unload all modules | module purge | 
| module show | Module information | module show python/3.9.7 | 
Common Workflows
Load compiler and MPI
module load gcc/9.3.0 openmpi/4.1.0
Load software
module load python/3.9.7
Storage and Disk Usage
Check Usage
Home directory usage
df -h $HOME
Scratch space usage
df -h /gpfs/scratch/$USER
User quota information
myquota
File Management
Delete logs older than 30 days
find $HOME -name "*.log" -mtime +30 -delete
Create compressed archive
tar -czf archive.tar.gz directory/
Extract archive
tar -xzf archive.tar.gz
Performance Monitoring
During Execution
Monitor running job resources
sstat -j $SLURM_JOB_ID --format=AveCPU,AvePages,AveRSS,MaxRSS
View your processes
top -u $USER
After Completion
Detailed resource usage
sacct -j 123456 --format=JobID,MaxRSS,AveRSS,MaxVMSize,AveCPU,Elapsed,State
Job efficiency report
seff 123456
Common Troubleshooting
Job Pending (PD)?
- Check resource availability: sinfo -p your_partition
- Review job requirements: scontrol show job JOBID
Error Diagnostics
Check last 20 lines of output
tail -20 slurm-123456.out
Search for errors in output
grep -i error slurm-123456.out
Connection Issues
Verbose SSH for debugging
ssh -v username@milan.seawulf.stonybrook.edu

