Pip vs Conda Tips

Managing Python Packages: Pip vs Conda

Python packages can be installed with either pip or conda. Each has strengths and weaknesses. Using them correctly prevents conflicts and ensures your research environment works consistently across job submissions.

When to Use Each Tool

Use Conda for:

  • Creating isolated environments
  • Packages with complex dependencies (numpy, pandas, scipy, scikit-learn)
  • GPU-enabled packages (tensorflow, pytorch)
  • Packages that need compiled libraries

Conda handles non-Python dependencies and avoids conflicts better than pip in most cases.

Use Pip for:

  • Pure Python packages
  • Packages not available through Conda
  • Installing from GitHub or local files

Mixing Pip and Conda:

You can safely use pip inside a Conda environment, but follow this order: install core packages with Conda first, then use pip for anything else. Don't switch back and forth, as this causes dependency conflicts.

Creating and Using Environments

Create a new environment for each project:

conda create -n myproject python=3.10
conda activate myproject

Install packages with Conda first:

conda install numpy pandas matplotlib

Then add pip packages if needed:

pip install package-not-in-conda

Note: For more details on using Conda on SeaWulf, see the Conda documentation.

Using Environments in Job Scripts

Activate your environment in your SBATCH script:

#!/bin/bash
#SBATCH --job-name=python_analysis
#SBATCH -p short-40core

module load anaconda/3
source activate myproject

python analysis.py

Documenting Your Environment

Save your environment so you or collaborators can recreate it:

# For Conda packages
conda env export > environment.yml

# For pip packages
pip freeze > requirements.txt

Recreate an environment from these files:

# From Conda
conda env create -f environment.yml

# From pip (in an active environment)
pip install -r requirements.txt

Common Issues

"Package not found" with pip:
Make sure your Conda environment is activated. Check with which pip. It should point to your environment, not the system pip.

Dependency conflicts:
If packages conflict, try creating a fresh environment and installing everything at once rather than one at a time.

Slow conda solve:
Use mamba as a faster alternative: module load mamba then use mamba install instead of conda install.

Best Practices

  • Create a new environment for each project or research area
  • Document your environment before running large jobs
  • Test your environment in an interactive session before submitting batch jobs