Python packages can be installed with either pip
or conda
. Each has strengths and weaknesses. Using them correctly prevents conflicts and ensures your research environment works consistently across job submissions.
When to Use Each Tool
Use Conda for:
- Creating isolated environments
- Packages with complex dependencies (numpy, pandas, scipy, scikit-learn)
- GPU-enabled packages (tensorflow, pytorch)
- Packages that need compiled libraries
Conda handles non-Python dependencies and avoids conflicts better than pip in most cases.
Use Pip for:
- Pure Python packages
- Packages not available through Conda
- Installing from GitHub or local files
Mixing Pip and Conda:
You can safely use pip inside a Conda environment, but follow this order: install core packages with Conda first, then use pip for anything else. Don't switch back and forth, as this causes dependency conflicts.
Creating and Using Environments
Create a new environment for each project:
conda create -n myproject python=3.10
conda activate myproject
Install packages with Conda first:
conda install numpy pandas matplotlib
Then add pip packages if needed:
pip install package-not-in-conda
Note: For more details on using Conda on SeaWulf, see the Conda documentation.
Using Environments in Job Scripts
Activate your environment in your SBATCH script:
#!/bin/bash
#SBATCH --job-name=python_analysis
#SBATCH -p short-40core
module load anaconda/3
source activate myproject
python analysis.py
Documenting Your Environment
Save your environment so you or collaborators can recreate it:
# For Conda packages
conda env export > environment.yml
# For pip packages
pip freeze > requirements.txt
Recreate an environment from these files:
# From Conda
conda env create -f environment.yml
# From pip (in an active environment)
pip install -r requirements.txt
Common Issues
"Package not found" with pip:
Make sure your Conda environment is activated. Check with which pip
. It should point to your environment, not the system pip.
Dependency conflicts:
If packages conflict, try creating a fresh environment and installing everything at once rather than one at a time.
Slow conda solve:
Use mamba as a faster alternative: module load mamba
then use mamba install
instead of conda install
.
Best Practices
- Create a new environment for each project or research area
- Document your environment before running large jobs
- Test your environment in an interactive session before submitting batch jobs