Why do I get a Segmentation fault using Intel MPI?

Intel MPI Segmentation faults

If using Intel MPI version 19.0.3 or 19.0.4, you may receive an error like this trying to run anything using mpirun:

/gpfs/software/intel/parallel-studio-xe/2019_4/compilers_and_libraries/linux/mpi/intel64/bin/mpirun: line 103: 115153 Segmentation fault      (core dumped) mpiexec.hydra "$@" 0<&0

This will occur when you try to use these versions of MPI on the login nodes or large memory node. However, the error should go away once you try running your MPI program on any compute node. If you need to use Intel MPI on the large memory node, just use version 19.0.0 or lower. Alternatively, you could try running your program with mvapich2 MPI (module load mvapich2).

But why?

In order to prevent important system processes on the login nodes and large memory node from crashing, we use cgroups to restrict the set of CPU's that Seawulf users can access, leaving some reserved just for system processes. We allow full use of all CPU's on the compute nodes since these nodes will only affect one job from one user if they crash. On the two newest updates of Intel MPI, Intel changed the way their program assigns threads to specific CPU's, and for some reason their  new code fails to ignore the CPU's restricted by cgroups. When MPI tries to schedule threads to run on the restricted CPU's, an error occurs, and their program crashes with a Segmentation fault. This will hopefully be fixed in a future update, but for now just use the solutions mentioned above.

Article Topic

 

Still Need Help? The best way to report your issue or make a request is by submitting a ticket.