Anaconda is a platform for managing Python packages and environments. Conda creates isolated environments so different projects can use different package versions without conflicts. This is essential for reproducible research on shared HPC systems.
Loading Anaconda
Before using Conda, load the Anaconda module:
module load anaconda/3
Add this to your ~/.bashrc
if you use Conda frequently, or include it in your job scripts.
Creating Environments
By Name
conda create --name myproject python=3.10
This creates the environment in /gpfs/home/NETID/.conda/envs/myproject
. Always specify the Python version to ensure consistency.
By Path
conda create --prefix /gpfs/projects/PerrottetGroup/NETID/envs/myproject python=3.10
This creates the environment in a custom location. Useful for large environments that might exceed home directory quotas. You cannot use both --name
and --prefix
together.
With Initial Packages
conda create --name myproject python=3.10 numpy pandas matplotlib
Installing packages during environment creation is faster than installing them separately later.
Managing Environments
List All Environments
conda env list
Shows all your environments and their locations.
Activate an Environment
# By name
conda activate myproject
# By path
conda activate /gpfs/scratch/NETID/envs/myproject
Deactivate
conda deactivate
Returns to the base environment.
Remove an Environment
# By name
conda env remove --name myproject
# By path
conda env remove --prefix /gpfs/scratch/NETID/envs/myproject
Installing Packages
After activating your environment:
conda install numpy scipy matplotlib
For packages not in the default channel, use conda-forge:
conda install -c conda-forge package-name
Search for available packages:
conda search package-name
Using Environments in Jobs
Include environment activation in your SBATCH script:
#!/bin/bash
#SBATCH --job-name=analysis
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=40
#SBATCH -p short-40core
module load anaconda/3
conda activate myproject
python my_analysis.py
Exporting and Sharing Environments
Export to YAML
conda env export > environment.yml
Create from YAML
conda env create -f environment.yml
This ensures collaborators or future you can recreate the exact environment.
Clone an Environment
conda create --name newproject --clone myproject
Useful for testing package updates without breaking your working environment.
Storage Considerations
Conda environments can be large. Consider these strategies:
Home directory (default):
- Pros: Backed up, persistent, safe from automatic deletion
- Cons: 20 GB limit
- Best for: Most environments
Project space:
- Pros: Shared with group, large capacity, persistent
- Cons: Not backed up
- Best for: Collaborative projects, large shared environments
Warning: We advise against creating environments in scratch space. The 30-day purge policy uses file timestamps, and package files may be deleted even if the environment is actively used, breaking your installation.
Example for project space:
conda create --prefix /gpfs/projects/groupname/envs/myproject python=3.10
Performance Tips
Clean Package Cache
Conda keeps downloaded packages cached. Clear old packages to save space:
conda clean --all
Troubleshooting
Environment activation fails:
Make sure you loaded the Anaconda module first with module load anaconda/3
.
Out of disk quota:
Check environment sizes with du -sh ~/.conda/envs/*
.
Package conflicts:
Create a fresh environment and install all packages at once rather than incrementally.
Best Practices
- Always specify Python version when creating environments
- Use one environment per project or research area
- Document your environment with
conda env export
- Test environments interactively before submitting large jobs
- Never install packages in the base environment
- Clean your package cache periodically
See also: For guidance on when to use pip vs conda, see Managing Python Packages: Pip vs Conda.