This guide assumes you have already received access to SeaWulf and are able to log in. It aims to familiarize you with the system's environment, including file system navigation, job management, and software utilization. By understanding these aspects, you can maximize your productivity and effectively leverage SeaWulf's computing capabilities for your projects.
Basic Linux Commands
SeaWulf uses CentOS as its operating system, one of the many variants of Linux. Unlike a desktop, you interact with this operating system through the terminal, sometimes referred to as the command line. Windows and OS X both have their own version of the terminal, even though most users choose not to use them. Here, the use of the terminal is mandatory, so it is important that you know your way around it.
mkdir
When you first log in you will arrive in your home directory. This is your own private folder to store things related to your work. You can make subdirectories, files, and even install software here. Making a subdirectory is simple. Use the mkdir command:
mkdir <directory_name>
Here, <directory_name> is the name you want to give the folder.
ls and pwd
After you have done this, you can use the ls command to verify that the directory has been created without issue. Typing in ls will result in a list of files and subdirectories being printed back to you, all of which are located in your present working directory. Your working directory is the command line equivalent of your current folder in Windows Explorer or Finder—it's the directory that you're currently looking at. When you type the pwd command, your working directory will be printed out to you:
/gpfs/home/<your_username>
The top-level directory, equivalent to C: on Windows, is always /gpfs. The home subdirectory of /gpfs contains all users' home directories.
cd
To change your present working directory, you can use the cd command, which stands for change directory.
cd <path>
You can change your directory using either an absolute or relative path. An absolute path begins with a forward-slash and specifies each level of subdirectories, starting from the root folder (which contains gpfs). A relative path does not start with a forward-slash, and fills in each subdirectory level up to the directory you're currently in. For example, if you are in /gpfs/home/<your username> and want to move to a subdirectory in that folder, just give cd the subdirectory name.
touch
If instead of a folder you would rather create a blank text file, you can use the touch command:
touch <new_filename>
You can then edit this file with a text editor of your choice (e.g., nano, vim, or emacs).
cat
To display the content of a document you can use the cat command:
cat <filename>
rm
If you want to delete a file or folder, you can use the rm command (short for remove). This command will permanently delete anything you tell it to (no trash bin!). You will pass this command different options, depending on what it is you want to remove. For a regular file, you can choose not to pass it any options at all:
rm <file_to_remove>
However, if you want to remove an entire directory (even if it's empty), you will have to pass it the -r option (short for recursive):
rm -r <folder_to_remove>
This will remove everything in that directory, files and subdirectories included. The recursive option is called such because it recursively deletes everything it finds. A word of warning—it is very easy to accidentally delete important information. Be very careful when using this command.
Most of these commands have a help or -h option. If you forget how to use a command, simply type that command followed by -h to get a description of it.
history
You can see the full list of the commands you have executed with the history command:
history
The list is numbered, and you can execute a certain command from the history using ! command followed by the number of the command you want to run:
!<command_number>
grep
The grep (globally search for regular expression and print out) command searches a file and displays all lines that contain a specified pattern (pattern can be just one word, or a whole sentence):
grep "<pattern>" <filename>
grep can be combined with other commands. For example, you can also print out a list of executed commands with a specific pattern by combining history with grep.
history | grep "pattern"
You can also add certain flags to the grep command, that will act as filters for search. For example, -i flag makes grep case insensitive, -v flag prints out lines that do not match the specified pattern, -c flag prints out the number of the lines that contain the pattern, and -n flag prints out the matched lines and their line numbers. The following syntax is used to specify flags:
grep <flag> "<pattern>" <filename>
Modules
All of the commands described above are not programs, but functionality built into the shell. The shell is the program you're interacting with whenever you type something into the terminal, and is always running. In addition to these commands, the shell has a few helpful features, one of which is the existence of environment variables. These are little bits of data that all programs can access, but which go away any time you log out. Typically they are used for storing paths to directories so that programs know where to look for the files they need.
Yet another command is the env command, which lists all of your environment variables. When you type this command, you will see something like this printed to your screen:
... COLLECTION_DATA=/data/collection XDG_SESSION_PATH=/org/freedesktop/DisplayManager/Session0 rvm_path=/home/austin/.rvm XDG_SEAT_PATH=/org/freedesktop/DisplayManager/Seat0 SSH_AUTH_SOCK=/run/user/1000/keyring/ssh DEFAULTS_PATH=/usr/share/gconf/ubuntu.default.path XDG_CONFIG_DIRS=/etc/xdg/xdg-ubuntu:/usr/share/upstart/xdg:/etc/xdg rvm_prefix=/home/austin ...
Each line is an individual environment variable. The name of the environment variable is in all caps (e.g. COLLECTION_DATA), and its value to the right of the equals sign.
Dealing with defining these every time you log in is cumbersome, which is why we have installed a software package to simplify the process. Using the module command (a program, this time), you can load and unload environment variables that you commonly need, depending on the software you use. The module command has several subcommands that perform different functions. The most common subcommands are:
module avail module load <some_module> module list module unload <some_module>
The load subcommand will load a module. This will make a certain software package callable from the terminal. If, for example, you load the matlab/2018a module, you will be able to start MatLab 2018a by typing in the command matlab.
The list subcommand will show you a list of all the modules you have loaded since logging in.
The avail subcommand will list all of the modules that are available to be loaded.
Special requests can be made to install software globally (outside of a home directory) through the ticketing system and are reviewed for suitability of the software in question.
If you accidentally load the wrong software package or want to switch to a different version of the same software, you should use the unload command to erase the environment variables associated with that software. If, for example, you decide that MatLab 2018a is insufficient and want to switch to the 2019a release, you would first unload the matlab/2018a module, then load the matlab/2019a module.
Other subcommands exist. To see a list of these subcommands and how to use them, type module help.
Slurm
Now that you know the basic ways of interacting with the cluster, the next step is to understand how to use it to run computational software. SeaWulf has what is called the login node. Each node on SeaWulf is an individual computer that is networked to all the other nodes, forming a computing cluster. The login node is the entry point to the cluster, and only exists as an interface to use the other nodes. Since the beginning of this guide you have been interacting with this node. Because everybody will be on this node, it shouldn't be used for heavy computation - otherwise, the system would slow down and become unusable. To actually run heavy computation, you will have to run your software on the compute nodes.
To manage demand, we have use a scheduling system called Slurm to grant you access to the compute nodes and run your job when nodes become available. All Slurm commands can only be used after loading its module:
module load slurm
Running an interactive job
Loading the Slurm module gives you access to several commands, one of which is srun. There are several different ways to use this command. To start off, we will begin an interactive job which asks for one compute node with 28 cores:
srun -N 1 -n 28 -p short-28core --pty bash
The -N flag specifies the number of nodes the job needs, and the -n flag specifies the number of tasks.
The -p flag specifies which queue you want to wait in.
The --pty bash option indicates that we want to manually control a node through the terminal.
A list of queues and their resource limits can be found here. Slurm documentation uses the word "partition" instead of "queue"; our FAQ pages will use these terms interchangeably.
After running this command you will either be waiting in the short-28core queue or given a node immediately. This depends on demand at the time. You can use the squeue command to show a list of jobs and their status to estimate how long you may be waiting in the queue, if at all.
Once granted access, your terminal will be interacting with the compute node instead of the login node. Here you can test software you have installed, as you are the only user on this node and have access to all its resources.
To end the interactive job session and return to the login node, type exit.
Running an automated job with Slurm
Interactive jobs are good for testing your code or installed software, but should not be used for long running computational jobs since your job will end once you log off. An automated job will run until finished, and with it you won't have to retype commands all the time.
To run an automated job with Slurm, you will need to write a job script. A job script is a text file that contains all of the information needed to run your job. Your job script will contain special Slurm directives starting with #SBATCH that specify job options, like the number of nodes desired and the expected completion time. Make sure that your #SBATCH directives look exactly like the example below (no space between # and SBATCH and SBATCH is all capitalized). Your job script can also communicate with Message Passing Interface (MPI) to enable programs to synchronize across nodes. MPI is the standard method for communication across more than one node in a computer cluster. In order to utilize multiple nodes for a single job, your software must be built with MPI, and you must use an MPI command (e.g., mpirun) when you execute your job.
Here is an example Slurm script:
#!/bin/bash # #SBATCH --job-name=test #SBATCH --output=res.txt #SBATCH --ntasks-per-node=40 #SBATCH --nodes=2 #SBATCH --time=05:00 #SBATCH -p short-40core #SBATCH --mail-type=BEGIN,END #SBATCH --mail-user=jane.smith@stonybrook.edu module load intel/oneAPI/2022.2 module load compiler mkl mpi cd /gpfs/projects/samples/intel_mpi_hello/ mpiicc mpi_hello.c -o intel_mpi_hello mpirun ./intel_mpi_hello
The --job-name option gives the job a name so that it can be easily found in the list of queued and running jobs.
The next three lines specify the file where output will be written, the number of CPUs per node, and the number of nodes to request.
In addition, we've specified a wall time (the maximum amount of time the job can run before it is killed) in the --time option.
The --mail-type and --mail-user options are not required but control whether the user should be notified via email when the job state changes (in this case when the job starts and finishes). Emails will only be sent to "stonybrook.edu" addresses.
The next three lines load the modules required to find the software run by the script. The mpi module is an implementation of MPI, needed for the mpirun command.
The script then sets the present working directory to a directory containing Intel MPI samples. By default, Slurm will set the working directory to the directory where the sbatch command was run.
To start the job, use the sbatch command with the filename of the script as the only argument. Your job will be placed in the specified queue and will run without your involvement. If you want to cancel the job at any point, you can use the scancel command, providing the number at the beginning of the job id found in the first column of the squeue printout.
Checking Job Status
First, make sure you have loaded the slurm module:
module load slurm
After you've submitted a job, you can check the status of your job in the queue using the squeue command. Issuing this command alone will return the status of every job currently managed by the scheduler. As a result we recommend narrowing the results by user name or job number:
squeue -j <your_job_number>
or
squeue -u <your_user_name>
Or, for a full list of options available to the squeue command issue:
man squeue
A two page command summary for Slurm can be found here.
Full documentation for Slurm can be found here.
DUO Two Factor Authentication
If you tried logging into SeaWulf recently, you may have noticed that you are required to use DUO security to authenticate. DUO provides an additional layer of security on the SeaWulf cluster by asking you to confirm your login attempt by accepting a push notification to your smart phone.
The Division of Information Technology offers the DUO service page, which can be referred to for additional information regarding this service.