Installing software packages locally with Anaconda
Anaconda is a popular open-source platform designed for managing and deploying software and environments, particularly for data science and machine learning applications. It simplifies package management and deployment by providing a convenient way to install, update, and manage software packages and their dependencies. Anaconda is used to create isolated environments, which helps avoid conflicts between different software versions and ensures that projects have the specific libraries they need.
Installing Software on SeaWulf
When you need software that isn't currently available on SeaWulf, you have two main options:.
- Request Installation by HPC Support: If the software is widely used or could benefit multiple users, you can submit a ticket to the HPC support staff to request its installation. This approach is ideal for programs that might be of general interest.
- Install Locally in Your Home or Project Directory:Alternatively, you may install the program locally in your home or project directory. The easiest way to install many software packages is by using the Anaconda package manager.
Installing Software Locally with Anaconda
Anaconda is an open-source platform designed for managing and deploying software and environments. It provides an efficient way to handle package installations and manage dependencies.
Loading Anaconda
Before installing software, load the Anaconda module:
module load anaconda/3
Creating a Custom Anaconda Environment
To prevent conflicts with existing software, it's best to create a custom environment. You can create an environment either by specifying a name or a directory:
By Name:
conda create --name env-name
This creates the environment in:
/gpfs/home/NETID/.conda/envs/env-name
By Directory:
conda create --prefix /path-to-env/env-name
This creates the environment in:
/path-to-env/env-name
Note: You can't combine the --prefix and --name flags, you may only choose one.
Activating the Environment
Activate your newly created environment with:
conda activate /path-to-env/env-name
By doing this, the environmental variables associated with your custom Anaconda environment (including the path to executable files) will become active.
Installing Software Packages
With your environment active, you can install packages using conda install. For example, to install the scipy package, use:
conda install scipy
After installation, the package’s executable files will be placed in the bin
directory within your environment. This directory is automatically added to your system's PATH, allowing you to run the executables directly from the command line.
Additionally, any libraries installed with Anaconda will be located in the lib
directory of your environment. You can find these directories as follows:
- Executable files:
.../env-name/bin/
- Libraries:
.../env-name/lib/
These directories ensure that your environment remains self-contained and manageable, avoiding conflicts with other software on the system.
Deactivating the Environment
Once you’re finished, return to the default environment by typing:
conda deactivate
Managing Storage with Anaconda
Managing storage is crucial when working with Anaconda, especially if you encounter file system quota issues. Here’s how to handle and optimize storage within your Anaconda environment:
Understanding File System Quotas
SeaWulf enforces storage quotas to ensure equitable resource allocation among all users. Exceeding your quota may result in errors when attempting to install new packages or create environments. Therefore, it is crucial to regularly monitor your storage usage and manage your files accordingly.
For detailed information about the file system, refer to the SeaWulf File System Overview.
To keep track of your available disk space, use the following commands:
df -h /gpfs/home/$USER # Check disk storage usage in your home directory:
df -hi /gpfs/home/$USER # Check disk inode usage in your home directory:
du -ah /gpfs/home/$USER | sort -rh | head -n 20 # Identify the 20 largest files in your home directory:
Additionally, you can use the following script to monitor both disk usage and file count:
/usr/lpp/mmfs/bin/mmlsquota --block-size auto -j ${USER}-home -v mmfs1
Cleaning Up Unused Packages and Environments
To free up space, you can remove unused packages and environments. Here’s how:
To remove unused packages
conda activate env-name # Activate the environment you wish to clean
conda list # Lists all installed packages
conda remove package-name # Replace 'package-name' with the package you want to remove
To remove unused environments:
conda env list # Lists all environments
conda env remove --name env-name # Replace 'env-name' with the environment you want to remove
Managing Anaconda Cache
Anaconda maintains caches of packages and environments to speed up future installations. However, these caches can consume significant storage over time. To clean up these caches, use:
conda clean --all
This command removes unused packages, caches, and tarballs from your system, helping to free up space.
Managing pip Cache
If you use pip
for package management alongside conda
, it also maintains a cache that can consume disk space. To clear the pip cache, use:
pip cache purge
This command removes the cache directory where pip stores downloaded packages, freeing up additional space.
For more detailed guidance on managing storage with Anaconda and pip
, refer to the conda documentation and pip documentation.