How do I use Ollama to run LLMs locally on SeaWulf?

Using Ollama to Run LLMs on dg-mem

Ollama is a powerful framework that enables you to run and interact with various language models (LLMs) locally, including popular models like Llama, Codellama, and Gemma. Available on dg-mem, a high-performance compute node on SeaWulf with 3 TB of RAM and 2 AMD MI210 GPUs, Ollama allows you to run these models directly without relying on cloud-based services. It offers several pre-trained models and the flexibility to customize prompts or fine-tune models to meet specific requirements.

The dg-mem node is equipped with 2 AMD Instinct MI210 GPUs, which provide robust computational power to accelerate the processing of large models, making it an ideal resource for running complex LLMs that require significant GPU capabilities.

Accessing Ollama on dg-mem

To access and use Ollama on the dg-mem node, follow these steps:

SSH into dg-mem:
Connect to dg-mem by running the following command:
```
ssh your_username@dg-mem.seawulf.stonybrook.edu
```
Alternatively, you can SSH from milan1/milan2 with the command:
```
ssh dg-mem
```
Load the Ollama module:
After logging in, load the Ollama module with the following command:
```
module load ollama/0.1.44-amd
```

This prepares your environment to use any of the available models on dg-mem.

Models currently available with Ollama on SeaWulf

llama3.3: The most advanced Llama model, ideal for high-quality, complex NLP tasks, providing superior performance on challenging problems.
llama3.2: A reliable, general-purpose model that balances computational efficiency with solid performance across various NLP tasks.
phi3:medium: A versatile model, offering a good balance of size and capability for many NLP tasks.
llama3.1:70b: A large-scale Llama model with 70 billion parameters, designed for high-performance tasks requiring significant computational power.
codellama: Optimized for coding-related tasks such as code completion and generation, ideal for software development and NLP applications.
gemma: A general-purpose model suitable for text generation, summarization, sentiment analysis, and other NLP applications.

You can list all available models by running:

ollama ls

How to Run a Model with Ollama

Once you've logged into dg-mem and loaded the Ollama module, running a model is simple. Here's how:

Run a Model:

To start any available model, use the following command. For example, to run Llama 3.2, use:
```
ollama run llama3.2
```
This will start the Llama 3.2 model, allowing you to interact with it directly in your terminal.
Stop a Running Model:

Exit the interface by pressing ctrl+d or typing “/bye”. If Ollama is running in the background, stop the model with the command:
```
ollama stop <model_name>
```
View Active Models:

To check which models are currently running, use:
```
ollama ps
```

Using Ollama, you can easily run powerful language models locally on the SeaWulf cluster, utilizing dg-mem’s computational resources without relying on cloud services.

Shared Resources on dg-mem

Please be aware that dg-mem is a shared node on SeaWulf, equipped with 2 AMD Instinct MI210 GPUs, and multiple users may be accessing it simultaneously.

Be respectful of others: Ensure that you are not monopolizing the node’s resources, particularly the AMD Instinct MI210 GPUs, which are essential for running large models. If the node is busy or other users are also running models, be mindful of the computational power you are utilizing.

By following these guidelines, you help maintain a collaborative and efficient computing environment for all users.

Article Topic

Modules