13.1 MPI

The major difficulty in parallel programming is subdividing problems so that different parts can be executed simultaneously on different machines. MPI is a library of routines that provides the functionality needed to allow those parts to communicate. But it will be up to you to determine how a problem can be broken into pieces so that it can run on different machines.

The simplest approach is to have the number of processes match the number of machines or processors that are available. However, this is not required. If you have a small problem that can be easily run on a subset of your cluster, or if your problem logically decomposes in such a way that you don't need the entire cluster, then you can (and should) execute the program on fewer machines. It is also possible to have multiple processes running on the same machine. This is particularly common when developing code. In this case, the operating system will switch between processes as needed. You won't benefit from the parallelization of the code, but the job will still complete correctly.

13.1.1 Core MPI

With most parallelizable problems, programs running on multiple computers do the bulk of the work and then communicate their individual results to a single computer that collects these intermediate results, combines them, and reports the final results. It is certainly possible to write a different program for each machine in the cluster, but from a software management perspective, it is much easier if we can write just one program. As the program executes on each machine, it will first determine which computer it is running on and, based on that information, tackle the appropriate part of the original problem. When the computation is complete, one machine will act as a receiver and all the other machines will send their results to it.

For this approach to work, each executing program or process must be able to differentiate itself from other processes. Let's look at a very basic example that demonstrates how processes, i.e., the program in execution on different computers, are able to differentiate themselves. While this example doesn't accomplish anything particularly useful, it shows how the pieces fit together. It introduces four key functions and one other useful function. And with a few minor changes, this program will serve as a template for future programs.

#include "mpi.h"

#include <stdio.h>

   

int main( int argc, char * argv[  ] )

{

   int  processId;      /* rank of process */

   int  noProcesses;    /* number of processes */

   int  nameSize;       /* length of name */

   char computerName[MPI_MAX_PROCESSOR_NAME];

   

   MPI_Init(&argc, &argv);

   MPI_Comm_size(MPI_COMM_WORLD, &noProcesses);

   MPI_Comm_rank(MPI_COMM_WORLD, &processId);

   

   MPI_Get_processor_name(computerName, &nameSize);

   fprintf(stderr,"Hello from process %d on %s\n", processId, computerName);

   

   MPI_Finalize( );

   

   return 0;

}

This example introduces five MPI functions, defined through the inclusion of the header file for the MPI library, mpi.h, and included when the MPI library is linked to the program. While this example uses C, similar libraries are available for C++ and FORTRAN.

Four of these functions, MPI_Init, MPI_Comm_size, MPI_Comm_rank, and MPI_Finalize, are seen in virtually every MPI program. We will look at each in turn. (Notice that all MPI identifiers begin with MPI_.)

13.1.1.1 MPI_Init

MPI_Init is used to initialize an MPI session. All MPI programs must have a call to MPI_Init. MPI_Init is called once, typically at the start of a program. You can have lots of other code before this call, or you can even call MPI_Init from a subroutine, but you should call it before any other MPI functions are called. (There is an exception: the function MPI_Initialized can be called before MPI_Init. MPI_Initialized is used to see if MPI_Init has been previously called.)

In C, MPI_Init can be called with the addresses for argc and argv as shown in the example. This allows the program to take advantage of command-line arguments. Alternatively, these addresses can be replaced with a NULL.

13.1.1.2 MPI_Finalize

MPI_Finalize is called to shut down MPI. MPI_Finalize should be the last MPI call made in a program. It is used to free memory, etc. It is the user's responsibility to ensure that all pending communications are complete before a process calls MPI_Finalize. You must write your code so that every process calls MPI_Finalize. Notice that there are no arguments.

13.1.1.3 MPI_Comm_size

This routine is used to determine the total number of processes running in a communicator (the communications group for the processes being used). It takes the communicator as the first argument and the address of an integer variable used to return the number of processes. For example, if you are executing a program using five processes and the default communicator, the value returned by MPI_Comm_size will be five, the total number of processes being used. This is number of processes, but not necessarily the number of machines being used.

In the example, both MPI_Comm_size and MPI_Comm_rank used the default communicator, MPI_COMM_WORLD. This communicator includes all the processes available at initialization and is created automatically for you. Communicators are used to distinguish and group messages. As such, communicators provide a powerful encapsulation mechanism. While it is possible to create and manipulate your own communicators, the default communicator will probably satisfy most of your initial needs.

13.1.1.4 MPI_Comm_rank

MPI_Comm_rank is used to determine the rank of the current process within the communicator. MPI_Comm_rank takes a communicator as its first argument and the address of an integer variable is used to return the value of the rank.

Basically, each process is assigned a different process number or rank within a communicator. Ranks range from 0 to one less than the size returned by MPI_Comm_size. For example, if you are running a set of five processes, the individual processes will be numbered 0, 1, 2, 3, and 4. By examining its rank, a process can distinguish itself from other processes.

The values returned by MPI_Comm_size and MPI_Comm_rank are often used to divvy up a problem among processes. For example, suppose that for some problem you want to divide the work among five processors. This is a decision you make when you run your program; your choice is not coded into the program since it may not be known when the program is written. Once the program is running, it can call MPI_Comm_size to determine the number of processes attacking the problem. In this example, it would return five. Each of the five processes now knows that it needs to solve one fifth of the original problem (assuming you've written the code this way).

Next, each individual process can examine its rank to determine its role in the calculation. Continuing with the current example, each process needs to decide which fifth of the original problem to work on. This is where MPI_Comm_rank comes in. Since each process has a different rank, it can use its rank to select its role. For example, the process with rank 0 might work on the first part of the problem; the process with rank 1 will work on the second part of the problem, etc.

Of course, you can divide up the problem differently if you like. For example, the process with rank 0 might collect all the results from the other processes for the final report rather than participate in the actual calculation. Or each process could use its rank as an index to an array to discover what parameters to use in a calculation. It is really up to you as a programmer to determine how you want to use this information.

13.1.1.5 MPI_Get_processor_name

MPI_Get_processor_name is used to retrieve the host name of the node on which the individual process is running. In the sample program, we used it to display host names. The first argument is an array to store the name and the second is used to return the actual length of the name.

MPI_Get_processor_name is a nice function to have around, particularly when you want to debug code, but otherwise it isn't used all that much. The first four MPI functions, however, are core functions and will be used in virtually every MPI program you'll write. If you drop the relevant declarations, the call to MPI_Get_processor_name, and the fprintf, you'll have a template that you can use when writing MPI programs.

Although we haven't used it, each of the C versions of these five functions returns an integer error code. With a few exceptions, the actual code is left up to the implementers. Error codes can be translated into meaningful messages using the MPI_Error_string function. In order to keep the code as simple as possible, this book has adopted the (questionable) convention of ignoring the returned error codes.

Here is an example of compiling and running the code:

[sloanjd@amy sloanjd]$ mpicc hello.c -o hello

[sloanjd@amy sloanjd]$ mpirun -np 5 hello

Hello from process 0 on amy

Hello from process 2 on oscarnode2.oscardomain

Hello from process 1 on oscarnode1.oscardomain

Hello from process 4 on oscarnode4.oscardomain

Hello from process 3 on oscarnode3.oscardomain

There are a couple of things to observe with this example. First, notice that there is no apparent order in the output. This will depend on the speed of the individual machines, the loads on the machines, and the speeds of the communications links. Unless you take explicit measures to control the order of execution among processors, you should make no assumptions about the order of execution.

Second, the role of MPI_Comm_size should now be clearer. When running the program, the user specifies the number of processes on the command line. MPI_Comm_size provided a way to get that information back into the program. Next time, if you want to use a different number of processes, just change the command line and your code will take care of the rest.

Table of Contents