Note: Monitoring helps you track how efficiently your jobs use resources. Use it to optimize performance and avoid wasting compute time.
Common Monitoring Commands
Command | Purpose | Example |
---|---|---|
squeue -u <netid> |
List your running and pending jobs | squeue -u sam123 |
sacct -j <jobid> -l |
Show detailed stats about a completed or running job | sacct -j 123456 -l |
seff <jobid> |
See efficiency (CPU, memory) of completed jobs | seff 123456 |
ssh <node> |
Log into allocated node to run real-time tools | ssh dn045 |
Real-Time Resource Tracking
To see live CPU and memory usage, identify your job’s node with squeue
and SSH into it:
squeue -u sam123 ssh dn045 top (or htop, glances, etc.)
This lets you monitor usage dynamically. For example, checking RES (resident memory) in top
.
Built-In Script: get_resource_usage.py
SeaWulf provides a script to summarize CPU and memory usage per job:
/gpfs/software/hpc_tools/get_resource_usage.py
Run it directly for your usage stats, or use these options to filter:
--user <username> --job <jobid> --node <nodename> --low <percent> --high <percent>
This gives you a quick snapshot of how your jobs are performing on the compute nodes.
For full details and more examples, check SeaWulf’s monitoring guide here: How can I monitor resource usage on SeaWulf?
Tips
- Use
squeue
andsacct
to check status and historical usage. - Try
seff
to quickly see if jobs are under- or over-using memory/CPU. - For live monitoring, SSH to the node and use
top
,htop
, orglances
. - Use
get_resource_usage.py
for a concise, script-based summary of usage. :contentReference[oaicite:1]{index=1} - Adjust your job’s resource requests based on observed usage to improve cluster efficiency.