Monitoring Jobs

Monitoring Jobs on SeaWulf

Note: Monitoring helps you track how efficiently your jobs use resources. Use it to optimize performance and avoid wasting compute time.

Common Monitoring Commands

Command Purpose Example
squeue -u <netid> List your running and pending jobs squeue -u sam123
sacct -j <jobid> -l Show detailed stats about a completed or running job sacct -j 123456 -l
seff <jobid> See efficiency (CPU, memory) of completed jobs seff 123456
ssh <node> Log into allocated node to run real-time tools ssh dn045

Real-Time Resource Tracking

To see live CPU and memory usage, identify your job’s node with squeue and SSH into it:

squeue -u sam123
ssh dn045
top   (or htop, glances, etc.)
    

This lets you monitor usage dynamically. For example, checking RES (resident memory) in top.

Built-In Script: get_resource_usage.py

SeaWulf provides a script to summarize CPU and memory usage per job:

/gpfs/software/hpc_tools/get_resource_usage.py
    

Run it directly for your usage stats, or use these options to filter:

--user <username>
--job <jobid>
--node <nodename>
--low <percent>
--high <percent>
    

This gives you a quick snapshot of how your jobs are performing on the compute nodes.

For full details and more examples, check SeaWulf’s monitoring guide here: How can I monitor resource usage on SeaWulf?

Tips

  • Use squeue and sacct to check status and historical usage.
  • Try seff to quickly see if jobs are under- or over-using memory/CPU.
  • For live monitoring, SSH to the node and use top, htop, or glances.
  • Use get_resource_usage.py for a concise, script-based summary of usage. :contentReference[oaicite:1]{index=1}
  • Adjust your job’s resource requests based on observed usage to improve cluster efficiency.