SLURM Overview

SLURM Overview

Understanding SeaWulf's workload manager and job scheduler

What is SLURM? SLURM (Simple Linux Utility for Resource Management) is an open-source workload manager and job scheduler used on SeaWulf to manage compute resources and run jobs efficiently across the cluster.

How SLURM Works

SLURM serves three primary functions on SeaWulf:

1

Resource Allocation

Allocates exclusive or shared access to compute nodes for specified durations.

2

Job Execution

Provides framework for starting, executing, and monitoring jobs on allocated nodes.

3

Queue Management

Manages job queues and arbitrates resource contention between competing jobs.

Getting Started

Before using SLURM commands on SeaWulf, load the SLURM module:

module load slurm

Essential SLURM Commands

Here are the core commands you'll use to submit and manage jobs:

Function SLURM Command Description
Submit batch job sbatch [script] Submit a job script to the queue
Submit interactive job srun --pty bash Start an interactive session on compute node
Check job status squeue View current job queue and status
Cancel job scancel [job_id] Cancel a running or queued job
Job details scontrol show job [job_id] Show detailed job information
Node information sinfo Display node and partition information

Job Script Basics

SLURM jobs are defined using job scripts that specify resource requirements and commands to run.

Essential SLURM Directives

Resource SLURM Directive Example
Job name #SBATCH --job-name= --job-name=my_job
Number of nodes #SBATCH --nodes= --nodes=2
Tasks per node #SBATCH --ntasks-per-node= --ntasks-per-node=40
Memory per node #SBATCH --mem= --mem=64GB
Wall time #SBATCH --time= --time=02:30:00
Partition/Queue #SBATCH -p -p short-40core
Output file #SBATCH --output= --output=job.%j.out
Error file #SBATCH --error= --error=job.%j.err

Monitoring Jobs

Check Your Jobs

squeue --user=$USER

Shows all your current and queued jobs.

Check Specific Job

squeue --job [job_id]

Shows status of a specific job by ID.

Detailed Job Information

scontrol show job [job_id]

Displays comprehensive job details including resource allocation and status.

Node Information

sinfo

Shows partition and node status information.

Useful Environment Variables

SLURM automatically sets several environment variables that your jobs can use:

Variable Description
$SLURM_JOBID Unique job identifier
$SLURM_SUBMIT_DIR Directory where job was submitted from
$SLURM_JOB_NODELIST List of nodes allocated to the job
$SLURM_NTASKS Total number of tasks for the job
$SLURM_CPUS_PER_TASK Number of CPUs allocated per task

Best Practices

Resource Estimation

Request only the resources you actually need. Over-requesting resources can lead to longer queue times and reduced system efficiency.

Output Files

Always specify output and error files to capture job information. Use %j to include the job ID in filenames:

#SBATCH --output=job.%j.out #SBATCH --error=job.%j.err

Module Loading

Load all required modules within your job script to ensure consistent environments across compute nodes.

Need Help? For detailed SLURM documentation, visit the official SLURM documentation. For SeaWulf-specific questions, submit a ticket to the IACS support system.