Anaconda is a platform for managing Python packages and environments. Conda creates isolated environments so different projects can use different package versions without conflicts. This is essential for reproducible research on shared HPC systems.
Loading Anaconda
Before using Conda, load the Anaconda module:
module load anaconda/3
Creating Environments
By Name
conda create --name myproject python=3.10
This creates the environment in /gpfs/home/<Your_NetID>/.conda/envs/myproject. Always specify the Python version to ensure consistency.
By Path
conda create --prefix /gpfs/projects/<Group_Name>/envs/myproject python=3.10
This creates the environment in a custom location. Useful for large environments that might exceed home directory quotas. You cannot use both --name and --prefix together.
With Initial Packages
conda create --name myproject python=3.10 numpy pandas matplotlib
Installing packages during environment creation is faster than installing them separately later.
Managing Environments
List All Environments
conda env list
Shows all your environments and their locations.
Activate an Environment
# By name
conda activate myproject
# By path
conda activate /gpfs/scratch/<Your_NetID>/envs/myproject
Deactivate
conda deactivate
Returns to the base environment.
Remove an Environment
# By name
conda env remove --name myproject
# By path
conda env remove --prefix /gpfs/scratch/<Your_NetID>/envs/myproject
Installing Packages
After activating your environment:
conda install numpy scipy matplotlib
For packages not in the default channel, use conda-forge:
conda install -c conda-forge package-name
Search for available packages:
conda search package-name
Combining Conda and Pip
You can use pip within a conda environment, but follow this order: install conda packages first, then use pip for anything not available through conda. Avoid switching back and forth.
# First install with conda
conda install numpy pandas matplotlib
# Then add pip packages if needed
pip install package-not-in-conda
See Managing Python Packages: Pip vs Conda for detailed guidance.
Using Environments in Jobs
Include environment activation in your SBATCH script:
#!/bin/bash
#SBATCH --job-name=analysis
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=40
#SBATCH -p short-40core
module load anaconda/3
conda activate myproject
python my_analysis.py
Exporting and Sharing Environments
Export to YAML
conda env export > environment.yml
Create from YAML
conda env create -f environment.yml
This ensures collaborators or future you can recreate the exact environment.
Clone an Environment
conda create --name newproject --clone myproject
Useful for testing package updates without breaking your working environment.
Storage Considerations
Conda environments can be large. Consider these strategies:
Home directory (default):
- Pros: Backed up, persistent, safe from automatic deletion
- Cons: 20 GB limit
- Best for: Most environments
Project space:
- Pros: Shared with group, large capacity, persistent
- Cons: Not backed up
- Best for: Collaborative projects, large shared environments
Warning: We advise against creating environments in scratch space. The 30-day purge policy uses file timestamps, and package files may be deleted even if the environment is actively used, breaking your installation.
Example for project space:
conda create --prefix /gpfs/projects/<Group_Name>/envs/myproject python=3.10
Monitoring Environment Size
Check the size of your environments to avoid quota issues:
# Check all conda environments
du -sh ~/.conda/envs/*
# Check total conda usage (including package cache)
du -sh ~/.conda
If approaching your quota, consider cleaning the package cache or moving large environments to project space.
Configuring Conda
You can customize conda's behavior with a ~/.condarc file. Useful settings include:
# Add conda-forge as default channel
channels:
- conda-forge
- defaults
# Show channel URLs when listing packages
show_channel_urls: true
# Always use strict channel priority
channel_priority: strict
This configuration prioritizes conda-forge packages and helps avoid mixing packages from incompatible sources.
Performance Tips
Clean Package Cache
Conda keeps downloaded packages cached. Clear old packages to save space:
conda clean --all
This removes unused packages, tarballs, and caches but preserves your environments.
Troubleshooting
Environment activation fails:
Make sure you loaded the Anaconda module first with module load anaconda/3.
Out of disk quota:
Use myquota to see how close you are to your limits. Check environment sizes with du -sh ~/.conda/envs/*. Clean the package cache with conda clean --all or move large environments to project space.
Package conflicts:
Create a fresh environment and install all packages at once rather than incrementally. Specify the conda-forge channel explicitly if needed.
"PackagesNotFoundError":
The package may not exist in your configured channels. Try searching with conda search -c conda-forge package-name or check if it's available via pip instead.
Slow environment creation:
This often happens with complex dependency chains. Try installing fewer packages at once, or create the environment with just Python first, then add packages in groups.
Best Practices
- Always specify Python version when creating environments
- Use one environment per project or research area
- Document your environment with
conda env exportbefore major production runs - Test environments interactively before submitting large jobs
- Never install packages in the base environment
- Clean your package cache periodically with
conda clean --all - Monitor your disk usage regularly, especially if using home directory
- Install all related packages together rather than one at a time
