MPI Tips

Open MPI is a little bit tricky to use. Hopefully, this article will offer some tips on how to use Open MPI on the ghc machines.

Getting set up

To run Open MPI programs on Gates machines, you need to set two environment variables:

  • LD_LIBRARY_PATH must include /usr/lib64/openmpi/lib
  • PATH must include /usr/lib64/openmpi/bin, and /usr/lib64/openmpi/bin must appear before /usr/local/bin in your PATH.

Compiling

We're going to play around with this simple Open MPI program (you have to fix a few lines due to awkward web pasting):

#include <mpi.h>
#include <stdio.h>
int main(int argc, char** argv) {
  // Initialize the MPI environment
  MPI_Init(NULL, NULL);

  // Get the number of processes.
  int world_size;
  MPI_Comm_size(MPI_COMM_WORLD, & world_size);

  // Get the rank of the process.
  int world_rank;
  MPI_Comm_rank(MPI_COMM_WORLD, & world_rank);

  // Get the name of the processor.
  char processor_name[MPI_MAX_PROCESSOR_NAME];
  int name_len;
  MPI_Get_processor_name(processor_name, & name_len);

  // Print off a hello world message.
  printf("Hello world from processor %s, rank %d"
         " out of %d processors\n",
         processor_name, world_rank, world_size);

  // Finalize the MPI environment.
  MPI_Finalize();
}

We compile this program with1 :

areece@ghc68$ mpic++ mpi.c -o hello_mpi

And now we're ready to run!

Running on one machine

To run on one machine, we use mpirun. We can control the number of processes launched using the parameter -np as follows:

areece@ghc68$ mpirun -np 6 hello_mpi
Hello world from processor ghc68.ghc.andrew.cmu.edu, rank 1 out of 6 processors
Hello world from processor ghc68.ghc.andrew.cmu.edu, rank 2 out of 6 processors
Hello world from processor ghc68.ghc.andrew.cmu.edu, rank 4 out of 6 processors
Hello world from processor ghc68.ghc.andrew.cmu.edu, rank 5 out of 6 processors
Hello world from processor ghc68.ghc.andrew.cmu.edu, rank 0 out of 6 processors
Hello world from processor ghc68.ghc.andrew.cmu.edu, rank 3 out of 6 processors

You'll note that the output is not in a deterministic order: this is expected because there is no synchronization to enforce a consistent ordering.

Running on multiple machines

If we want to test scaling on our process, we can use mpirun to run it on multiple machines:

areece@ghc68$ mpirun -np 6 --host ghc18,ghc19,ghc20 hello_mpi 
Hello world from processor ghc19.ghc.andrew.cmu.edu, rank 1 out of 6 processors
Hello world from processor ghc19.ghc.andrew.cmu.edu, rank 4 out of 6 processors
Hello world from processor ghc18.ghc.andrew.cmu.edu, rank 0 out of 6 processors
Hello world from processor ghc18.ghc.andrew.cmu.edu, rank 3 out of 6 processors
Hello world from processor ghc20.ghc.andrew.cmu.edu, rank 2 out of 6 processors
Hello world from processor ghc20.ghc.andrew.cmu.edu, rank 5 out of 6 processors

You should note a few things here.

  • The program is run on each of the machines specified (and not necessarily on the machine you are currently running on)
  • mpirun launches 6 threads total across all machines, not 6 thread per machine.
  • mpirun assigns work to machines in a round robin fashion. It's possible to control this assignment if you're interested.

If you are annoyed with specifying the hosts on the command line every time, you can specify a file containing all the hosts:

areece@ghc68$ cat hostfile 
ghc18
ghc19
ghc20
areece@ghc68$ mpirun -np 6 --hostfile hostfile -bind-to-core hello_mpi
Hello world from processor ghc19.ghc.andrew.cmu.edu, rank 1 out of 6 processors
Hello world from processor ghc18.ghc.andrew.cmu.edu, rank 0 out of 6 processors
Hello world from processor ghc19.ghc.andrew.cmu.edu, rank 4 out of 6 processors
Hello world from processor ghc18.ghc.andrew.cmu.edu, rank 3 out of 6 processors
Hello world from processor ghc20.ghc.andrew.cmu.edu, rank 2 out of 6 processors
Hello world from processor ghc20.ghc.andrew.cmu.edu, rank 5 out of 6 processors

  1. Yes, I know mpicc might be more idiomatic for c programs, but I'm ignoring that for now.