Script Structure
A well-structured SBATCH script follows a consistent layout that makes it readable, maintainable, and less prone to errors. Every SBATCH script should follow this basic structure:
1. Shebang Line: Always start with the interpreter directive
2. SBATCH Directives: Resource requirements and job configuration
3. Environment Setup: Module loading and variable definitions
4. Job Execution: The actual commands to run
Complete Script Template
1. Shebang Line
#!/bin/bash
2. SBATCH Directives (Resource Requirements)
# Job identification #SBATCH --job-name=my_job #SBATCH --output=results_%j.out #SBATCH --error=results_%j.err # Resource allocation #SBATCH --nodes=1 #SBATCH --ntasks-per-node=40 #SBATCH --mem=64GB #SBATCH --time=02:00:00 # Queue selection #SBATCH -p short-40core
3. Environment Setup
# Load required modules module purge module load intel/oneAPI/2022.2 module load compiler/latest module load mpi/latest # Set environment variables export OMP_NUM_THREADS=1 export I_MPI_PIN_DOMAIN=omp # Display job info echo "Job ID: $SLURM_JOBID" echo "Running on nodes: $SLURM_JOB_NODELIST" echo "Number of tasks: $SLURM_NTASKS" echo "Start time: $(date)"
4. Job Execution
# Change to working directory cd $SLURM_SUBMIT_DIR # Compile application (if needed) mpiicc -O3 -o my_app my_source.c # Run the application echo "Starting computation..." mpirun ./my_app input.dat > computation.log # Post-processing (if needed) echo "Job completed at: $(date)"
SBATCH Directive Best Practices
Group Related Directives
Organize directives logically with comments:
# Job identification #SBATCH --job-name=protein_folding #SBATCH --output=folding_%j.out #SBATCH --error=folding_%j.err # Resource requirements #SBATCH --nodes=4 #SBATCH --ntasks-per-node=40 #SBATCH --mem-per-cpu=2GB #SBATCH --time=12:00:00 # Job placement #SBATCH -p long-40core
Use Descriptive Names
Make output files easy to identify:
# Good: descriptive filenames #SBATCH --output=simulation_%x_%j.out #SBATCH --error=simulation_%x_%j.err # %x = job name, %j = job ID
Include All Required Resources
Be explicit about your needs:
# Specify all resources clearly #SBATCH --nodes=2 #SBATCH --ntasks-per-node=28 #SBATCH --cpus-per-task=1 #SBATCH --mem=120GB #SBATCH --time=06:30:00 #SBATCH --gres=gpu:2 # For GPU jobs
Environment Setup Guidelines
Always Purge Modules First
module purge module load specific/version/needed
This ensures a clean environment and prevents module conflicts.
Set Important Variables
# Threading control export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK # MPI settings export I_MPI_PIN_DOMAIN=omp export I_MPI_PIN_PROCESSOR_LIST=allcores # Application-specific variables export TMPDIR=/tmp/$USER/$SLURM_JOBID
Add Job Information
# Print useful job information echo "=========================================" echo "Job ID: $SLURM_JOBID" echo "Job Name: $SLURM_JOB_NAME" echo "Nodes: $SLURM_JOB_NODELIST" echo "CPUs per task: $SLURM_CPUS_PER_TASK" echo "Tasks per node: $SLURM_NTASKS_PER_NODE" echo "Working directory: $(pwd)" echo "Start time: $(date)" echo "========================================="
Job Execution Best Practices
Change to Submit Directory
cd $SLURM_SUBMIT_DIR
Ensures your job runs from the directory where you submitted it.
Add Error Checking
# Exit on any error set -e # Check if input files exist if [ ! -f "input.dat" ]; then echo "ERROR: input.dat not found" exit 1 fi # Run application with error checking mpirun ./my_app input.dat if [ $? -ne 0 ]; then echo "ERROR: Application failed" exit 1 fi
Include Timing and Logging
# Time the main computation echo "Starting main computation at: $(date)" start_time=$(date +%s) mpirun ./my_app input.dat end_time=$(date +%s) runtime=$((end_time - start_time)) echo "Computation completed in $runtime seconds"
Common Directive Reference
Directive | Purpose | Example |
---|---|---|
-p, --partition |
Specify queue/partition | #SBATCH -p short-40core |
-N, --nodes |
Number of nodes | #SBATCH -N 2 |
-n, --ntasks |
Total number of tasks | #SBATCH -n 80 |
--ntasks-per-node |
Tasks per node | #SBATCH --ntasks-per-node=40 |
-c, --cpus-per-task |
CPUs per task (for threading) | #SBATCH -c 40 |
--mem |
Memory per node | #SBATCH --mem=64GB |
--mem-per-cpu |
Memory per CPU | #SBATCH --mem-per-cpu=2GB |
-t, --time |
Wall time limit | #SBATCH -t 02:30:00 |
-J, --job-name |
Job name | #SBATCH -J my_job |
-o, --output |
Standard output file | #SBATCH -o job_%j.out |
-e, --error |
Standard error file | #SBATCH -e job_%j.err |
--gres |
Generic resources (GPUs) | #SBATCH --gres=gpu:2 |
--array |
Job array indices | #SBATCH --array=1-100 |
Pro Tip: Keep a template script for each type of job you commonly run. This saves time and reduces errors when submitting new jobs.
Remember: Always test your scripts with small resource requests first. Use interactive sessions to debug before submitting large batch jobs.