Conda Environments

Conda Environments on SeaWulf

Anaconda is a platform for managing Python packages and environments. Conda creates isolated environments so different projects can use different package versions without conflicts. This is essential for reproducible research on shared HPC systems.

Loading Anaconda

Before using Conda, load the Anaconda module:

module load anaconda/3

Add this to your ~/.bashrc if you use Conda frequently, or include it in your job scripts.

Creating Environments

By Name

conda create --name myproject python=3.10

This creates the environment in /gpfs/home/NETID/.conda/envs/myproject. Always specify the Python version to ensure consistency.

By Path

conda create --prefix /gpfs/projects/PerrottetGroup/NETID/envs/myproject python=3.10

This creates the environment in a custom location. Useful for large environments that might exceed home directory quotas. You cannot use both --name and --prefix together.

With Initial Packages

conda create --name myproject python=3.10 numpy pandas matplotlib

Installing packages during environment creation is faster than installing them separately later.

Managing Environments

List All Environments

conda env list

Shows all your environments and their locations.

Activate an Environment

# By name
conda activate myproject

# By path
conda activate /gpfs/scratch/NETID/envs/myproject

Deactivate

conda deactivate

Returns to the base environment.

Remove an Environment

# By name
conda env remove --name myproject

# By path
conda env remove --prefix /gpfs/scratch/NETID/envs/myproject

Installing Packages

After activating your environment:

conda install numpy scipy matplotlib

For packages not in the default channel, use conda-forge:

conda install -c conda-forge package-name

Search for available packages:

conda search package-name

Using Environments in Jobs

Include environment activation in your SBATCH script:

#!/bin/bash
#SBATCH --job-name=analysis
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=40
#SBATCH -p short-40core

module load anaconda/3
conda activate myproject

python my_analysis.py

Exporting and Sharing Environments

Export to YAML

conda env export > environment.yml

Create from YAML

conda env create -f environment.yml

This ensures collaborators or future you can recreate the exact environment.

Clone an Environment

conda create --name newproject --clone myproject

Useful for testing package updates without breaking your working environment.

Storage Considerations

Conda environments can be large. Consider these strategies:

Home directory (default):

  • Pros: Backed up, persistent, safe from automatic deletion
  • Cons: 20 GB limit
  • Best for: Most environments

Project space:

  • Pros: Shared with group, large capacity, persistent
  • Cons: Not backed up
  • Best for: Collaborative projects, large shared environments

Warning: We advise against creating environments in scratch space. The 30-day purge policy uses file timestamps, and package files may be deleted even if the environment is actively used, breaking your installation.

Example for project space:

conda create --prefix /gpfs/projects/groupname/envs/myproject python=3.10

Performance Tips

Clean Package Cache

Conda keeps downloaded packages cached. Clear old packages to save space:

conda clean --all

Troubleshooting

Environment activation fails:
Make sure you loaded the Anaconda module first with module load anaconda/3.

Out of disk quota:
Check environment sizes with du -sh ~/.conda/envs/*

Package conflicts:
Create a fresh environment and install all packages at once rather than incrementally.

Best Practices

  • Always specify Python version when creating environments
  • Use one environment per project or research area
  • Document your environment with conda env export
  • Test environments interactively before submitting large jobs
  • Never install packages in the base environment
  • Clean your package cache periodically

See also: For guidance on when to use pip vs conda, see Managing Python Packages: Pip vs Conda.