This KB Article References: High Performance Computing
This Information is Intended for: Instructors, Researchers, Staff, Students
Created: 03/07/2025
Last Updated: 03/07/2025
Introduction to AOCC on SeaWulf2 and SeaWulf3
The AMD Optimizing C/C++ Compiler (AOCC) is a high-performance compiler suite designed specifically for AMD processors on the SeaWulf2 and SeaWulf3 computing environments. AOCC provides advanced optimization techniques tailored for AMD EPYC processors, delivering enhanced performance for scientific and technical applications running on AMD hardware.
This compiler suite is built on the LLVM/Clang infrastructure and incorporates AMD-specific optimizations that leverage the unique architecture of AMD CPUs. For researchers and developers working on AMD-based nodes in the SeaWulf cluster, AOCC offers significant performance advantages over general-purpose compilers, particularly for computationally intensive workloads.
Available AOCC Versions
SeaWulf currently provides the following AOCC versions:
- aocc/3.2.0 - Based on LLVM 13
- aocc/4.0.0 - Based on LLVM 14
- aocc/4.2.0 - Based on LLVM 15 (recommended)
To load a specific AOCC version, use the module command:
Note: After loading an AOCC module, you will need to separately load a compatible MPI implementation to enable parallel programming capabilities.
MPI Integration with AOCC
For parallel programming capabilities, you need to load an MPI implementation alongside AOCC. AOCC works well with both OpenMPI and MVAPICH2:
module load aocc/4.2.0
module load openmpi/aocc4.2/4.1.5
module load aocc/4.2.0
module load mvapich2/aocc4.2/2.3.7
Important: Ensure that the MPI implementation you choose is compatible with your selected AOCC version. The module naming convention typically indicates compatibility (e.g., openmpi/aocc4.2/4.1.5 for AOCC 4.2.0).
Important Notes for Users
- AMD-Specific Nodes: AOCC is most beneficial when used on AMD EPYC processor-based nodes. Ensure your SLURM job is allocated to the appropriate partition.
- Module Loading: When utilizing MPI with AOCC compilers, you must load both the AOCC module and a compatible MPI module in your SLURM job scripts.
- Verifying Paths: After loading modules, use the
which clangcommand to confirm correct path configurations.
Example SLURM script with AOCC and OpenMPI:
#SBATCH --job-name=aocc_test
#SBATCH --output=aocc_test.out
#SBATCH -p short-96core
#SBATCH --ntasks=96
#SBATCH --time=01:00:00
# Load necessary modules
module load aocc/4.2.0
module load openmpi/aocc4.0/4.1.5
# Compile and run your code
mpicc -O3 -march=znver3 mpi_example.c -o mpi_example
mpirun -np $SLURM_NTASKS ./mpi_example
AOCC Compilers
The AMD Optimizing C/C++ Compiler suite provides a collection of compilers optimized for AMD architecture and parallel programming tasks:
- C Compiler: clang
- C++ Compiler: clang++
- Fortran Compiler: flang
Example usage:
MPI Wrappers for AOCC
MPI implementations provide AOCC compiler wrappers that streamline the compilation process for C, C++, and Fortran codes:
Note: These wrappers automatically include the necessary MPI libraries and compiler flags, simplifying the compilation of MPI-enabled applications.
OpenMPI Wrappers:
- mpicc: for AOCC C compiler (clang)
- mpicxx: for AOCC C++ compiler (clang++)
- mpifort: for AOCC Fortran compiler (flang)
MVAPICH2 Wrappers:
- mpicc: for AOCC C compiler (clang)
- mpicxx: for AOCC C++ compiler (clang++)
- mpif90: for AOCC Fortran compiler (flang)
Example usage:
AOCC Version Comparison
Feature | AOCC 3.2.0 | AOCC 4.0.0 | AOCC 4.2.0 |
---|---|---|---|
Base LLVM Version | LLVM 13 | LLVM 14 | LLVM 15 |
Primary Zen Architecture Support | Zen2, Zen3 | Zen2, Zen3 | Zen2, Zen3, Zen4 |
Auto-Vectorization | Basic | Enhanced | Advanced |
Math Libraries | AOCL 3.1 | AOCL 3.2 | AOCL 4.0 |
Recommended for | Legacy code | General use | New projects, best performance |
Important Version Information
When this software is updated, sometimes there are revisions that can significantly change how you should write your code and potentially alter the output of your programs. For detailed information on update revisions and deprecated features, refer to the following links:
Additionally, subtle changes not always documented publicly can affect software behavior. Of particular concern in high-performance computing applications is floating point precision, as different AOCC versions may yield varying results. These resources are crucial for maintaining compatibility and optimizing performance across software updates.
Optimization Tips for AOCC
To maximize performance with AOCC on AMD nodes in SeaWulf, consider the following optimization strategies:
- Basic Optimization Flags:
# High optimization level
clang -O3 myprogram.c -o myprogram# Maximum optimization (may affect precision)
clang -Ofast myprogram.c -o myprogram - AMD EPYC-Specific Optimization:
# Optimize for EPYC Zen3 architecture
clang -march=znver3 myprogram.c -o myprogramNote:
-march=znver3
optimizes for AMD EPYC Zen3 architecture. Use-march=znver2
for Zen2 processors or-march=znver4
for Zen4 processors. Ensure you're targeting the correct architecture for your allocated nodes. - Vectorization:
# Enable and report vectorization
clang -O3 -fvectorize -Rpass=loop-vectorize myprogram.c -o myprogram - Profile-Guided Optimization:
# Step 1: Compile with instrumentation
clang -O3 -fprofile-instr-generate myprogram.c -o myprogram
# Step 2: Run the program to collect profile data
./myprogram
# Step 3: Create profile data
llvm-profdata merge -output=myprogram.profdata default.profraw
# Step 4: Recompile using the profile data
clang -O3 -fprofile-instr-use=myprogram.profdata myprogram.c -o myprogram - AMD-Specific Libraries:
# Link with AMD libraries for optimized math operations
clang -O3 -march=znver3 myprogram.c -o myprogram -lamdlibm - Function Inlining:
# Aggressive function inlining
clang -O3 -finline-functions -mllvm -inline-threshold=1000 myprogram.c -o myprogram
Note: Always test your code's correctness after applying aggressive optimizations, as some optimizations might affect numerical precision or algorithm behavior.
AOCC vs GCC Performance Considerations
When choosing between AOCC and GCC for your SeaWulf workloads, consider the following performance aspects:
- AMD-Specific Optimizations: AOCC typically delivers superior performance for computation-intensive workloads on AMD EPYC processors due to architecture-specific optimizations.
- Library Compatibility: Some scientific libraries may be better optimized for GCC. Test both compilers with your specific application to determine the best performance.
- Vectorization Efficiency: AOCC often provides better automatic vectorization for AMD hardware, particularly for floating-point intensive operations.
- Compilation Time: GCC may compile code faster in some cases, which can be beneficial during development iterations.
Recommendation: For production runs on AMD nodes, compile your code with both AOCC and GCC, and benchmark to determine which compiler delivers better performance for your specific application.