Essential commands and workflows for working with SeaWulf and HPC systems. Keep this handy for quick reference during your computational work.
Job Submission Commands
Basic Job Submission
sbatch job_script.sh
Submit a batch job script to the queue
srun --partition=short-40core --time=01:00:00 --ntasks=1 my_program
Run an interactive job directly
salloc --partition=short-40core --time=02:00:00 --ntasks=4
Request an interactive allocation
Common SBATCH Directives
#!/bin/bash
#SBATCH--job-name=my_job
#SBATCH --output=results_%j.txt
#SBATCH --error=errors_%j.txt
#SBATCH --time=02:00:00
#SBATCH --partition=short-40core
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=8
#SBATCH --mem=32G
#SBATCH --mail-type=END,FAIL
#SBATCH --mail-user=your.email@stonybrook.edu
#SBATCH--job-name=my_job
#SBATCH --output=results_%j.txt
#SBATCH --error=errors_%j.txt
#SBATCH --time=02:00:00
#SBATCH --partition=short-40core
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=8
#SBATCH --mem=32G
#SBATCH --mail-type=END,FAIL
#SBATCH --mail-user=your.email@stonybrook.edu
Standard job script template with common options
Job Management
Command | Function | Example |
---|---|---|
squeue | View job queue | squeue -u $USER |
scancel | Cancel jobs | scancel 123456 |
scontrol show job | Job details | scontrol show job 123456 |
sacct | Job history | sacct -j 123456 |
sstat | Running job stats | sstat -j 123456 |
Useful Queue Commands
squeue -u $USER --format="%.10i %.12j %.8T %.10M %.6D %R"
View your jobs with custom formatting
scancel -u $USER --state=pending
Cancel all your pending jobs
sacct --starttime=2024-01-01 --format=JobID,JobName,State,ExitCode
View job history since a specific date
System Information
Command | Information Displayed | Usage |
---|---|---|
sinfo | Partition and node status | sinfo -p short-40core |
scontrol show nodes | Detailed node information | scontrol show nodes compute-1-1 |
scontrol show partition | Partition details | scontrol show partition short-40core |
sshare | Fair share information | sshare -u $USER |
Node Status Shortcuts
sinfo -N -l
List all nodes with detailed information
sinfo --format="%.15N %.10c %.10m %.25f %.10G %.6t"
Custom node format showing cores, memory, features, GPUs, and state
File Transfer
Secure Copy (scp)
scp local_file.txt username@seawulf.stonybrook.edu:~/
Upload a file to your home directory
scp -r local_directory/ username@seawulf.stonybrook.edu:~/
Upload a directory recursively
scp username@seawulf.stonybrook.edu:~/remote_file.txt ./
Download a file from the cluster
Rsync (Recommended for Large Transfers)
rsync -avz local_directory/ username@seawulf.stonybrook.edu:~/remote_directory/
Sync directories with compression and progress
rsync -avz --progress username@seawulf.stonybrook.edu:~/data/ ./local_data/
Download with progress display
Rsync Flags: -a (archive mode), -v (verbose), -z (compress), --progress (show progress), --dry-run (test without transferring)
Environment Modules
Command | Purpose | Example |
---|---|---|
module avail | List available modules | module avail python |
module load | Load a module | module load python/3.9 |
module unload | Unload a module | module unload python |
module list | Show loaded modules | module list |
module purge | Unload all modules | module purge |
module show | Display module information | module show python/3.9 |
Common Module Workflows
module load gcc/9.3.0 openmpi/4.1.0
Load compiler and MPI library
module load python/3.9 scipy-stack/2021a
Load Python with scientific libraries
Storage and Disk Usage
Checking Quotas and Usage
df -h $HOME
Check home directory usage
df -h /gpfs/scratch/$USER
Check scratch space usage
myquota
Display user quota information
File Management
find $HOME -name "*.log" -mtime +30 -delete
Delete log files older than 30 days
tar -czf archive.tar.gz directory/
Create compressed archive
tar -xzf archive.tar.gz
Extract compressed archive
Performance Monitoring
During Job Execution
sstat -j $SLURM_JOB_ID --format=AveCPU,AvePages,AveRSS,MaxRSS
Monitor running job resource usage
top -u $USER
View your running processes (on login nodes only)
After Job Completion
sacct -j 123456 --format=JobID,MaxRSS,AveRSS,MaxVMSize,AveCPU,Elapsed,State
Detailed resource usage for completed job
seff 123456
Job efficiency report (if available)
Common Troubleshooting
Job Issues
Job Pending (PD)?
- Check resource availability: sinfo -p your_partition
- Review job requirements: scontrol show job JOBID
- Check account limits: sshare -u $USER
Error Diagnostics
tail -20 slurm-123456.out
Check last 20 lines of job output
grep -i error slurm-123456.out
Search for errors in job output
Login Issues
ssh -v username@seawulf.stonybrook.edu
Verbose SSH connection for debugging
Quick Workflows
Interactive Development Session
salloc --partition=short-40core --time=02:00:00 --cpus-per-task=4 --mem=16G
Request interactive resources, then use srun for commands
Array Job Submission
#SBATCH --array=1-10 #SBATCH --output=job_%A_%a.out echo "Processing task $SLURM_ARRAY_TASK_ID"
Submit multiple similar jobs with different parameters
GPU Job Example
#SBATCH --partition=gpu
#SBATCH --gres=gpu:1
#SBATCH --time=04:00:00
module load cuda/11.2
nvidia-smi
#SBATCH --gres=gpu:1
#SBATCH --time=04:00:00
module load cuda/11.2
nvidia-smi
Pro Tips:
- Always test jobs with short time limits first
- Use --dry-run with rsync to preview file transfers
- Set appropriate memory limits to avoid job failures
- Monitor job efficiency to optimize resource usage
- Use scratch space for temporary files and intensive I/O