MPI¶
An MPI job is a set of processes that take advantage of multiple compute nodes by communicating with each other. OpenMPI and Intel MPI (IMPI) are implementations of the MPI standard.
This section shows how to submit one of the MPI programs compiled in the section Compiling MPI Programs.
Use module load to set up the paths to access these libraries. Use module avail to see all MPI packages installed on Gilbreth.
Example MPI Job Submission File¶
Create a job submission file named mpi_hello.sub:
SLURM can run an MPI program with the srun command. The number of processes is requested with the -n option.
If you do not specify the -n option, it will default to the total number of processor cores you request from SLURM.
If the code is built with OpenMPI, it can be run with a simple srun -n command.
If it is built with Intel IMPI, then you also need to add the --mpi=pmi2 option:
Submit the MPI Job¶
Submit the MPI job:
View Results¶
View results in the output file:
Example output:
If the job failed to run, view error messages in the output file.
Reducing MPI Ranks Per Node for Memory-Heavy Jobs¶
If an MPI job uses a lot of memory and 16 MPI ranks per compute node use all of the memory of the compute nodes, request more compute nodes while keeping the total number of MPI ranks unchanged.
Submit the job with double the number of compute nodes and modify the resource request to halve the number of MPI ranks per compute node.
Create or modify mpi_hello.sub:
Submit the job:
View results in the output file:
Example output:
Notes¶
- Use
slistto determine which queues, specified by the--accountor-Aoption, are available to you. - The queue available to everyone on Gilbreth is
standby. - Invoking an MPI program on Gilbreth with
./programis typically wrong, since this will use only one MPI process and defeat the purpose of using MPI. - Unless using only one MPI process is what you want, which is rarely the case, use
srunormpiexecto invoke an MPI program. - In general, the exact order in which MPI ranks write similar output to an output file is random.