AOCC (AMD Optimizing C Compiler)

This KB Article References: High Performance Computing
This Information is Intended for: Instructors, Researchers, Staff, Students
Created: 03/07/2025
Last Updated: 03/07/2025

Introduction to AOCC on SeaWulf2 and SeaWulf3

The AMD Optimizing C/C++ Compiler (AOCC) is a high-performance compiler suite designed specifically for AMD processors on the SeaWulf2 and SeaWulf3 computing environments. AOCC provides advanced optimization techniques tailored for AMD EPYC processors, delivering enhanced performance for scientific and technical applications running on AMD hardware.

This compiler suite is built on the LLVM/Clang infrastructure and incorporates AMD-specific optimizations that leverage the unique architecture of AMD CPUs. For researchers and developers working on AMD-based nodes in the SeaWulf cluster, AOCC offers significant performance advantages over general-purpose compilers, particularly for computationally intensive workloads.

Available AOCC Versions

SeaWulf currently provides the following AOCC versions:

aocc/3.2.0 - Based on LLVM 13
aocc/4.0.0 - Based on LLVM 14
aocc/4.2.0 - Based on LLVM 15 (recommended)

To load a specific AOCC version, use the module command:

module load aocc/4.2.0

Note: After loading an AOCC module, you will need to separately load a compatible MPI implementation to enable parallel programming capabilities.

MPI Integration with AOCC

For parallel programming capabilities, you need to load an MPI implementation alongside AOCC. AOCC works well with both OpenMPI and MVAPICH2:

# Load AOCC and OpenMPI modules
module load aocc/4.2.0
module load openmpi/aocc4.2/4.1.5

# Alternative: Load AOCC and MVAPICH2 modules
module load aocc/4.2.0
module load mvapich2/aocc4.2/2.3.7

Important: Ensure that the MPI implementation you choose is compatible with your selected AOCC version. The module naming convention typically indicates compatibility (e.g., openmpi/aocc4.2/4.1.5 for AOCC 4.2.0).

Important Notes for Users

AMD-Specific Nodes: AOCC is most beneficial when used on AMD EPYC processor-based nodes. Ensure your SLURM job is allocated to the appropriate partition.
Module Loading: When utilizing MPI with AOCC compilers, you must load both the AOCC module and a compatible MPI module in your SLURM job scripts.
Verifying Paths: After loading modules, use the
which clang
command to confirm correct path configurations.

Example SLURM script with AOCC and OpenMPI:

#!/bin/bash
#SBATCH --job-name=aocc_test
#SBATCH --output=aocc_test.out
#SBATCH -p short-96core
#SBATCH --ntasks=96
#SBATCH --time=01:00:00

# Load necessary modules
module load aocc/4.2.0
module load openmpi/aocc4.0/4.1.5

# Compile and run your code
mpicc -O3 -march=znver3 mpi_example.c -o mpi_example
mpirun -np $SLURM_NTASKS ./mpi_example

AOCC Compilers

The AMD Optimizing C/C++ Compiler suite provides a collection of compilers optimized for AMD architecture and parallel programming tasks:

C Compiler: clang
C++ Compiler: clang++
Fortran Compiler: flang

Example usage:

clang myprogram.c -o myprogram # Compile a C program with the AOCC C compiler

clang++ myprogram.cpp -o myprogram # Compile a C++ program with the AOCC C++ compiler

flang myprogram.f90 -o myprogram # Compile a Fortran program with the AOCC Fortran compiler

MPI Wrappers for AOCC

MPI implementations provide AOCC compiler wrappers that streamline the compilation process for C, C++, and Fortran codes:

Note: These wrappers automatically include the necessary MPI libraries and compiler flags, simplifying the compilation of MPI-enabled applications.

OpenMPI Wrappers:

mpicc: for AOCC C compiler (clang)
mpicxx: for AOCC C++ compiler (clang++)
mpifort: for AOCC Fortran compiler (flang)

MVAPICH2 Wrappers:

mpicc: for AOCC C compiler (clang)
mpicxx: for AOCC C++ compiler (clang++)
mpif90: for AOCC Fortran compiler (flang)

Example usage:

mpicc mpi_program.c -o mpi_program # Compile an MPI C program

mpicxx mpi_program.cpp -o mpi_program # Compile an MPI C++ program

mpifort mpi_program.f90 -o mpi_program # Compile an MPI Fortran program with OpenMPI

mpif90 mpi_program.f90 -o mpi_program # Compile an MPI Fortran program with MVAPICH2

AOCC Version Comparison

Feature	AOCC 3.2.0	AOCC 4.0.0	AOCC 4.2.0
Base LLVM Version	LLVM 13	LLVM 14	LLVM 15
Primary Zen Architecture Support	Zen2, Zen3	Zen2, Zen3	Zen2, Zen3, Zen4
Auto-Vectorization	Basic	Enhanced	Advanced
Math Libraries	AOCL 3.1	AOCL 3.2	AOCL 4.0
Recommended for	Legacy code	General use	New projects, best performance

Important Version Information

When this software is updated, sometimes there are revisions that can significantly change how you should write your code and potentially alter the output of your programs. For detailed information on update revisions and deprecated features, refer to the following links:

Additionally, subtle changes not always documented publicly can affect software behavior. Of particular concern in high-performance computing applications is floating point precision, as different AOCC versions may yield varying results. These resources are crucial for maintaining compatibility and optimizing performance across software updates.

Optimization Tips for AOCC

To maximize performance with AOCC on AMD nodes in SeaWulf, consider the following optimization strategies:

Basic Optimization Flags:
# High optimization level
clang -O3 myprogram.c -o myprogram

# Maximum optimization (may affect precision)
clang -Ofast myprogram.c -o myprogram
AMD EPYC-Specific Optimization:
# Optimize for EPYC Zen3 architecture
clang -march=znver3 myprogram.c -o myprogram

Note: -march=znver3 optimizes for AMD EPYC Zen3 architecture. Use -march=znver2 for Zen2 processors or -march=znver4 for Zen4 processors. Ensure you're targeting the correct architecture for your allocated nodes.
Vectorization:
# Enable and report vectorization
clang -O3 -fvectorize -Rpass=loop-vectorize myprogram.c -o myprogram
Profile-Guided Optimization:
# Step 1: Compile with instrumentation
clang -O3 -fprofile-instr-generate myprogram.c -o myprogram

# Step 2: Run the program to collect profile data
./myprogram

# Step 3: Create profile data
llvm-profdata merge -output=myprogram.profdata default.profraw

# Step 4: Recompile using the profile data
clang -O3 -fprofile-instr-use=myprogram.profdata myprogram.c -o myprogram
AMD-Specific Libraries:
# Link with AMD libraries for optimized math operations
clang -O3 -march=znver3 myprogram.c -o myprogram -lamdlibm
Function Inlining:
# Aggressive function inlining
clang -O3 -finline-functions -mllvm -inline-threshold=1000 myprogram.c -o myprogram

Note: Always test your code's correctness after applying aggressive optimizations, as some optimizations might affect numerical precision or algorithm behavior.

AOCC vs GCC Performance Considerations

When choosing between AOCC and GCC for your SeaWulf workloads, consider the following performance aspects:

AMD-Specific Optimizations: AOCC typically delivers superior performance for computation-intensive workloads on AMD EPYC processors due to architecture-specific optimizations.
Library Compatibility: Some scientific libraries may be better optimized for GCC. Test both compilers with your specific application to determine the best performance.
Vectorization Efficiency: AOCC often provides better automatic vectorization for AMD hardware, particularly for floating-point intensive operations.
Compilation Time: GCC may compile code faster in some cases, which can be beneficial during development iterations.

Recommendation: For production runs on AMD nodes, compile your code with both AOCC and GCC, and benchmark to determine which compiler delivers better performance for your specific application.