HPC Cheat Sheet

HPC Cheat Sheet

Essential commands and workflows for working with SeaWulf and HPC systems. Keep this handy for quick reference during your computational work.

Job Submission Commands

Basic Job Submission

sbatch job_script.sh
Submit a batch job script to the queue
srun --partition=short-40core --time=01:00:00 --ntasks=1 my_program
Run an interactive job directly
salloc --partition=short-40core --time=02:00:00 --ntasks=4
Request an interactive allocation

Common SBATCH Directives

#!/bin/bash
#SBATCH--job-name=my_job
#SBATCH --output=results_%j.txt
#SBATCH --error=errors_%j.txt
#SBATCH --time=02:00:00
#SBATCH --partition=short-40core
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=8
#SBATCH --mem=32G
#SBATCH --mail-type=END,FAIL
#SBATCH --mail-user=your.email@stonybrook.edu
Standard job script template with common options

Job Management

Command Function Example
squeue View job queue squeue -u $USER
scancel Cancel jobs scancel 123456
scontrol show job Job details scontrol show job 123456
sacct Job history sacct -j 123456
sstat Running job stats sstat -j 123456

Useful Queue Commands

squeue -u $USER --format="%.10i %.12j %.8T %.10M %.6D %R"
View your jobs with custom formatting
scancel -u $USER --state=pending
Cancel all your pending jobs
sacct --starttime=2024-01-01 --format=JobID,JobName,State,ExitCode
View job history since a specific date

System Information

Command Information Displayed Usage
sinfo Partition and node status sinfo -p short-40core
scontrol show nodes Detailed node information scontrol show nodes compute-1-1
scontrol show partition Partition details scontrol show partition short-40core
sshare Fair share information sshare -u $USER

Node Status Shortcuts

sinfo -N -l
List all nodes with detailed information
sinfo --format="%.15N %.10c %.10m %.25f %.10G %.6t"
Custom node format showing cores, memory, features, GPUs, and state

File Transfer

Secure Copy (scp)

scp local_file.txt username@seawulf.stonybrook.edu:~/
Upload a file to your home directory
scp -r local_directory/ username@seawulf.stonybrook.edu:~/
Upload a directory recursively
scp username@seawulf.stonybrook.edu:~/remote_file.txt ./
Download a file from the cluster

Rsync (Recommended for Large Transfers)

rsync -avz local_directory/ username@seawulf.stonybrook.edu:~/remote_directory/
Sync directories with compression and progress
rsync -avz --progress username@seawulf.stonybrook.edu:~/data/ ./local_data/
Download with progress display
Rsync Flags: -a (archive mode), -v (verbose), -z (compress), --progress (show progress), --dry-run (test without transferring)

Environment Modules

Command Purpose Example
module avail List available modules module avail python
module load Load a module module load python/3.9
module unload Unload a module module unload python
module list Show loaded modules module list
module purge Unload all modules module purge
module show Display module information module show python/3.9

Common Module Workflows

module load gcc/9.3.0 openmpi/4.1.0
Load compiler and MPI library
module load python/3.9 scipy-stack/2021a
Load Python with scientific libraries

Storage and Disk Usage

Checking Quotas and Usage

df -h $HOME
Check home directory usage
df -h /gpfs/scratch/$USER
Check scratch space usage
myquota
Display user quota information

File Management

find $HOME -name "*.log" -mtime +30 -delete
Delete log files older than 30 days
tar -czf archive.tar.gz directory/
Create compressed archive
tar -xzf archive.tar.gz
Extract compressed archive

Performance Monitoring

During Job Execution

sstat -j $SLURM_JOB_ID --format=AveCPU,AvePages,AveRSS,MaxRSS
Monitor running job resource usage
top -u $USER
View your running processes (on login nodes only)

After Job Completion

sacct -j 123456 --format=JobID,MaxRSS,AveRSS,MaxVMSize,AveCPU,Elapsed,State
Detailed resource usage for completed job
seff 123456
Job efficiency report (if available)

Common Troubleshooting

Job Issues

Job Pending (PD)?
  • Check resource availability: sinfo -p your_partition
  • Review job requirements: scontrol show job JOBID
  • Check account limits: sshare -u $USER

Error Diagnostics

tail -20 slurm-123456.out
Check last 20 lines of job output
grep -i error slurm-123456.out
Search for errors in job output

Login Issues

ssh -v username@seawulf.stonybrook.edu
Verbose SSH connection for debugging

Quick Workflows

Interactive Development Session

salloc --partition=short-40core --time=02:00:00 --cpus-per-task=4 --mem=16G
Request interactive resources, then use srun for commands

Array Job Submission

#SBATCH --array=1-10 #SBATCH --output=job_%A_%a.out echo "Processing task $SLURM_ARRAY_TASK_ID"
Submit multiple similar jobs with different parameters

GPU Job Example

#SBATCH --partition=gpu
#SBATCH --gres=gpu:1
#SBATCH --time=04:00:00
module load cuda/11.2
nvidia-smi
 
Pro Tips:
  • Always test jobs with short time limits first
  • Use --dry-run with rsync to preview file transfers
  • Set appropriate memory limits to avoid job failures
  • Monitor job efficiency to optimize resource usage
  • Use scratch space for temporary files and intensive I/O