SLURM Overview
Understanding SeaWulf's workload manager and job scheduler
What is SLURM? SLURM (Simple Linux Utility for Resource Management) is an open-source workload manager and job scheduler used on SeaWulf to manage compute resources and run jobs efficiently across the cluster.
How SLURM Works
SLURM serves three primary functions on SeaWulf:
Resource Allocation
Allocates exclusive or shared access to compute nodes for specified durations.
Job Execution
Provides framework for starting, executing, and monitoring jobs on allocated nodes.
Queue Management
Manages job queues and arbitrates resource contention between competing jobs.
Getting Started
Before using SLURM commands on SeaWulf, load the SLURM module:
Essential SLURM Commands
Here are the core commands you'll use to submit and manage jobs:
Function | SLURM Command | Description |
---|---|---|
Submit batch job | sbatch [script] | Submit a job script to the queue |
Submit interactive job | srun --pty bash | Start an interactive session on compute node |
Check job status | squeue | View current job queue and status |
Cancel job | scancel [job_id] | Cancel a running or queued job |
Job details | scontrol show job [job_id] | Show detailed job information |
Node information | sinfo | Display node and partition information |
Job Script Basics
SLURM jobs are defined using job scripts that specify resource requirements and commands to run.
Essential SLURM Directives
Resource | SLURM Directive | Example |
---|---|---|
Job name | #SBATCH --job-name= | --job-name=my_job |
Number of nodes | #SBATCH --nodes= | --nodes=2 |
Tasks per node | #SBATCH --ntasks-per-node= | --ntasks-per-node=40 |
Memory per node | #SBATCH --mem= | --mem=64GB |
Wall time | #SBATCH --time= | --time=02:30:00 |
Partition/Queue | #SBATCH -p | -p short-40core |
Output file | #SBATCH --output= | --output=job.%j.out |
Error file | #SBATCH --error= | --error=job.%j.err |
Monitoring Jobs
Check Your Jobs
Shows all your current and queued jobs.
Check Specific Job
Shows status of a specific job by ID.
Detailed Job Information
Displays comprehensive job details including resource allocation and status.
Node Information
Shows partition and node status information.
Useful Environment Variables
SLURM automatically sets several environment variables that your jobs can use:
Variable | Description |
---|---|
$SLURM_JOBID | Unique job identifier |
$SLURM_SUBMIT_DIR | Directory where job was submitted from |
$SLURM_JOB_NODELIST | List of nodes allocated to the job |
$SLURM_NTASKS | Total number of tasks for the job |
$SLURM_CPUS_PER_TASK | Number of CPUs allocated per task |
Best Practices
Resource Estimation
Request only the resources you actually need. Over-requesting resources can lead to longer queue times and reduced system efficiency.
Output Files
Always specify output and error files to capture job information. Use %j to include the job ID in filenames:
Module Loading
Load all required modules within your job script to ensure consistent environments across compute nodes.
Need Help? For detailed SLURM documentation, visit the official SLURM documentation. For SeaWulf-specific questions, submit a ticket to the IACS support system.