SBATCH Script Layout

SBATCH Script Layout

How to structure job submission scripts for optimal results

A well-structured SBATCH script follows a consistent layout that makes it readable, maintainable, and less prone to errors. This guide shows you the recommended structure and best practices.

Standard Script Structure

Every SBATCH script should follow this basic structure:

1

Shebang Line

Always start with the interpreter directive

2

SBATCH Directives

Resource requirements and job configuration

3

Environment Setup

Module loading and variable definitions

4

Job Execution

The actual commands to run

Complete Script Template

1. Shebang Line
#!/bin/bash
2. SBATCH Directives (Resource Requirements)
# Job identification
#SBATCH --job-name=my_job
#SBATCH --output=results_%j.out
#SBATCH --error=results_%j.err

# Resource allocation
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=40
#SBATCH --mem=64GB
#SBATCH --time=02:00:00

# Queue Selection
#SBATCH -p short-40core
3. Environment Setup
# Load required modules
module purge
module load intel/oneAPI/2022.2
module load compiler/latest
module load mpi/latest

# Set environment variables
export OMP_NUM_THREADS=1
export I_MPI_PIN_DOMAIN=omp

# Display job info
echo "Job ID: $SLURM_JOBID"
echo "Running on nodes: $SLURM_JOB_NODELIST"
echo "Number of tasks: $SLURM_NTASKS"
echo "Start time: $(date)"
4. Job Execution
# Change to working directory
cd $SLURM_SUBMIT_DIR

# Compile application (if needed)
mpiicc -O3 -o my_app my_source.c

# Run the application
echo "Starting computation..."
mpirun ./my_app input.dat > computation.log

# Post-processing (if needed)
echo "Job completed at: $(date)"

SBATCH Directive Best Practices

Group Related Directives

Organize directives logically with comments:

# Job identification
#SBATCH --job-name=protein_folding
#SBATCH --output=folding_%j.out
#SBATCH --error=folding_%j.err

# Resource requirements
#SBATCH --nodes=4
#SBATCH --ntasks-per-node=40
#SBATCH --mem-per-cpu=2GB
#SBATCH --time=12:00:00

# Job placement
#SBATCH -p long-40core

Use Descriptive Names

Make output files easy to identify:

# Good: descriptive filenames
#SBATCH --output=simulation_%x_%j.out
#SBATCH --error=simulation_%x_%j.err

# %x = job name, %j = job ID

Include All Required Resources

Be explicit about your needs:

# Specify all resources clearly
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=28
#SBATCH --cpus-per-task=1
#SBATCH --mem=120GB # For shared partitions
#SBATCH --time=06:30:00
#SBATCH --gres=gpu:2 # For GPU jobs

Environment Setup Guidelines

Always Purge Modules First

module purge module load specific/version/needed

This ensures a clean environment and prevents module conflicts.

Set Important Variables

# Threading control
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK

# MPI settings
export I_MPI_PIN_DOMAIN=omp
export I_MPI_PIN_PROCESSOR_LIST=allcores

# Application-specific variables
export TMPDIR=/tmp/$USER/$SLURM_JOBID

Add Job Information

# Print useful job information
echo "========================================="
echo "Job ID: $SLURM_JOBID" echo "Job Name: $SLURM_JOB_NAME"
echo "Nodes: $SLURM_JOB_NODELIST"
echo "CPUs per task: $SLURM_CPUS_PER_TASK"
echo "Tasks per node: $SLURM_NTASKS_PER_NODE"
echo "Working directory: $(pwd)"
echo "Start time: $(date)"
echo "========================================="

Job Execution Best Practices

Change to Submit Directory

cd $SLURM_SUBMIT_DIR

Ensures your job runs from the directory where you submitted it.

Add Error Checking

# Exit on any error set -e # Check if input files exist if [ ! -f "input.dat" ]; then echo "ERROR: input.dat not found" exit 1 fi # Run application with error checking mpirun ./my_app input.dat if [ $? -ne 0 ]; then echo "ERROR: Application failed" exit 1 fi

Include Timing and Logging

# Time the main computation echo "Starting main computation at: $(date)" start_time=$(date +%s) mpirun ./my_app input.dat end_time=$(date +%s) runtime=$((end_time - start_time)) echo "Computation completed in $runtime seconds"

Common Script Types

CPU-Only Job

#!/bin/bash
#SBATCH --job-name=cpu_analysis
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=40
#SBATCH --time=04:00:00
#SBATCH -p short-40core

module load intel/oneAPI/2022.2
cd $SLURM_SUBMIT_DIR

mpirun ./analysis_tool

GPU Job

#!/bin/bash
#SBATCH --job-name=gpu_training
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=28
#SBATCH --gres=gpu:1
#SBATCH --time=08:00:00
#SBATCH -p gpu

module load anaconda/3 cuda/11.8
source activate my_env
cd $SLURM_SUBMIT_DIR

python train_model.py

Array Job

#!/bin/bash
#SBATCH --job-name=parameter_sweep
#SBATCH --array=1-100
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=40
#SBATCH --time=02:00:00
#SBATCH -p short-40core

module load python/3.9
cd $SLURM_SUBMIT_DIR

python simulation.py --param-set $SLURM_ARRAY_TASK_ID

Remember: Always test your scripts with small resource requests first. Use interactive sessions to debug before submitting large batch jobs.

Pro Tip: Keep a template script for each type of job you commonly run. This saves time and reduces errors when submitting new jobs.