AOCC (AMD Optimizing C Compiler)

AOCC on SeaWulf

This KB Article References: High Performance Computing
This Information is Intended for: Instructors, Researchers, Staff, Students
Created: 03/07/2025
Last Updated: 03/07/2025

Introduction to AOCC on SeaWulf2 and SeaWulf3

The AMD Optimizing C/C++ Compiler (AOCC) is a high-performance compiler suite designed specifically for AMD processors on the SeaWulf2 and SeaWulf3 computing environments. AOCC provides advanced optimization techniques tailored for AMD EPYC processors, delivering enhanced performance for scientific and technical applications running on AMD hardware.

This compiler suite is built on the LLVM/Clang infrastructure and incorporates AMD-specific optimizations that leverage the unique architecture of AMD CPUs. For researchers and developers working on AMD-based nodes in the SeaWulf cluster, AOCC offers significant performance advantages over general-purpose compilers, particularly for computationally intensive workloads.

Available AOCC Versions

SeaWulf currently provides the following AOCC versions:

  • aocc/3.2.0 - Based on LLVM 13
  • aocc/4.0.0 - Based on LLVM 14
  • aocc/4.2.0 - Based on LLVM 15 (recommended)

To load a specific AOCC version, use the module command:

module load aocc/4.2.0

Note: After loading an AOCC module, you will need to separately load a compatible MPI implementation to enable parallel programming capabilities.

MPI Integration with AOCC

For parallel programming capabilities, you need to load an MPI implementation alongside AOCC. AOCC works well with both OpenMPI and MVAPICH2:

# Load AOCC and OpenMPI modules
module load aocc/4.2.0
module load openmpi/aocc4.2/4.1.5
# Alternative: Load AOCC and MVAPICH2 modules
module load aocc/4.2.0
module load mvapich2/aocc4.2/2.3.7

Important: Ensure that the MPI implementation you choose is compatible with your selected AOCC version. The module naming convention typically indicates compatibility (e.g., openmpi/aocc4.2/4.1.5 for AOCC 4.2.0).

Important Notes for Users

  • AMD-Specific Nodes: AOCC is most beneficial when used on AMD EPYC processor-based nodes. Ensure your SLURM job is allocated to the appropriate partition.
  • Module Loading: When utilizing MPI with AOCC compilers, you must load both the AOCC module and a compatible MPI module in your SLURM job scripts.
  • Verifying Paths: After loading modules, use the
    which clang
    command to confirm correct path configurations.

Example SLURM script with AOCC and OpenMPI:

#!/bin/bash
#SBATCH --job-name=aocc_test
#SBATCH --output=aocc_test.out
#SBATCH -p short-96core
#SBATCH --ntasks=96
#SBATCH --time=01:00:00

# Load necessary modules
module load aocc/4.2.0
module load openmpi/aocc4.0/4.1.5

# Compile and run your code
mpicc -O3 -march=znver3 mpi_example.c -o mpi_example
mpirun -np $SLURM_NTASKS ./mpi_example

AOCC Compilers

The AMD Optimizing C/C++ Compiler suite provides a collection of compilers optimized for AMD architecture and parallel programming tasks:

  • C Compiler: clang
  • C++ Compiler: clang++
  • Fortran Compiler: flang

Example usage:

clang myprogram.c -o myprogram # Compile a C program with the AOCC C compiler
clang++ myprogram.cpp -o myprogram # Compile a C++ program with the AOCC C++ compiler
flang myprogram.f90 -o myprogram # Compile a Fortran program with the AOCC Fortran compiler

MPI Wrappers for AOCC

MPI implementations provide AOCC compiler wrappers that streamline the compilation process for C, C++, and Fortran codes:

Note: These wrappers automatically include the necessary MPI libraries and compiler flags, simplifying the compilation of MPI-enabled applications.

OpenMPI Wrappers:

  • mpicc: for AOCC C compiler (clang)
  • mpicxx: for AOCC C++ compiler (clang++)
  • mpifort: for AOCC Fortran compiler (flang)

MVAPICH2 Wrappers:

  • mpicc: for AOCC C compiler (clang)
  • mpicxx: for AOCC C++ compiler (clang++)
  • mpif90: for AOCC Fortran compiler (flang)

Example usage:

mpicc mpi_program.c -o mpi_program # Compile an MPI C program
mpicxx mpi_program.cpp -o mpi_program # Compile an MPI C++ program
mpifort mpi_program.f90 -o mpi_program # Compile an MPI Fortran program with OpenMPI
mpif90 mpi_program.f90 -o mpi_program # Compile an MPI Fortran program with MVAPICH2

AOCC Version Comparison

Feature AOCC 3.2.0 AOCC 4.0.0 AOCC 4.2.0
Base LLVM Version LLVM 13 LLVM 14 LLVM 15
Primary Zen Architecture Support Zen2, Zen3 Zen2, Zen3 Zen2, Zen3, Zen4
Auto-Vectorization Basic Enhanced Advanced
Math Libraries AOCL 3.1 AOCL 3.2 AOCL 4.0
Recommended for Legacy code General use New projects, best performance

Important Version Information

When this software is updated, sometimes there are revisions that can significantly change how you should write your code and potentially alter the output of your programs. For detailed information on update revisions and deprecated features, refer to the following links:

Additionally, subtle changes not always documented publicly can affect software behavior. Of particular concern in high-performance computing applications is floating point precision, as different AOCC versions may yield varying results. These resources are crucial for maintaining compatibility and optimizing performance across software updates.

Optimization Tips for AOCC

To maximize performance with AOCC on AMD nodes in SeaWulf, consider the following optimization strategies:

  • Basic Optimization Flags:
    # High optimization level
    clang -O3 myprogram.c -o myprogram
    # Maximum optimization (may affect precision)
    clang -Ofast myprogram.c -o myprogram
  • AMD EPYC-Specific Optimization:
    # Optimize for EPYC Zen3 architecture
    clang -march=znver3 myprogram.c -o myprogram

    Note: -march=znver3 optimizes for AMD EPYC Zen3 architecture. Use -march=znver2 for Zen2 processors or -march=znver4 for Zen4 processors. Ensure you're targeting the correct architecture for your allocated nodes.

  • Vectorization:
    # Enable and report vectorization
    clang -O3 -fvectorize -Rpass=loop-vectorize myprogram.c -o myprogram
  • Profile-Guided Optimization:
    # Step 1: Compile with instrumentation
    clang -O3 -fprofile-instr-generate myprogram.c -o myprogram

    # Step 2: Run the program to collect profile data
    ./myprogram

    # Step 3: Create profile data
    llvm-profdata merge -output=myprogram.profdata default.profraw

    # Step 4: Recompile using the profile data
    clang -O3 -fprofile-instr-use=myprogram.profdata myprogram.c -o myprogram
  • AMD-Specific Libraries:
    # Link with AMD libraries for optimized math operations
    clang -O3 -march=znver3 myprogram.c -o myprogram -lamdlibm
  • Function Inlining:
    # Aggressive function inlining
    clang -O3 -finline-functions -mllvm -inline-threshold=1000 myprogram.c -o myprogram

Note: Always test your code's correctness after applying aggressive optimizations, as some optimizations might affect numerical precision or algorithm behavior.

AOCC vs GCC Performance Considerations

When choosing between AOCC and GCC for your SeaWulf workloads, consider the following performance aspects:

  • AMD-Specific Optimizations: AOCC typically delivers superior performance for computation-intensive workloads on AMD EPYC processors due to architecture-specific optimizations.
  • Library Compatibility: Some scientific libraries may be better optimized for GCC. Test both compilers with your specific application to determine the best performance.
  • Vectorization Efficiency: AOCC often provides better automatic vectorization for AMD hardware, particularly for floating-point intensive operations.
  • Compilation Time: GCC may compile code faster in some cases, which can be beneficial during development iterations.

Recommendation: For production runs on AMD nodes, compile your code with both AOCC and GCC, and benchmark to determine which compiler delivers better performance for your specific application.