13.3 An MPI Solution

Now that we've seen how to create a serial solution, let's look at a parallel solution. We'll look at the solution first in C and then in FORTRAN and C++.

13.3.1 A C Solution

The reason this area problem is both interesting and commonly used is that it is very straightforward to subdivide this problem. We can let different computers calculate the areas for different rectangles. Along the way, we'll introduce two new functions, MPI_Send and MPI_Receive, used to exchange information among processes.

Basically, MPI_Comm_size and MPI_Comm_rank are used to divide the problem among processors. MPI_Send is used to send the intermediate results back to the process with rank 0, which collects the results with MPI_Recv and prints the final answer. Here is the program:

#include "mpi.h"

#include <stdio.h>

   

/* problem parameters */

#define f(x)            ((x) * (x))

#define numberRects     50

#define lowerLimit      2.0

#define upperLimit      5.0

   

int main( int argc, char * argv[  ] )

{

   /* MPI variables */

   int dest, noProcesses, processId, src, tag;

   MPI_Status status;

   

   /* problem variables */

   int         i;

   double      area, at, height, lower, width, total, range;

   

   /* MPI setup */

   MPI_Init(&argc, &argv);

   MPI_Comm_size(MPI_COMM_WORLD, &noProcesses);

   MPI_Comm_rank(MPI_COMM_WORLD, &processId);

   

   /* adjust problem size for subproblem*/

   range = (upperLimit - lowerLimit) / noProcesses;

   width = range / numberRects;

   lower = lowerLimit + range * processId;

   

   /* calculate area for subproblem */

   area = 0.0;

   for (i = 0; i < numberRects; i++)

   {  at = lower + i * width + width / 2.0;

      height = f(at);

      area = area + width * height;

   }

   

   /* collect information and print results */

   tag = 0;

   if (processId = = 0)         /* if rank is 0, collect results */

   {  total = area;

      for (src=1; src < noProcesses; src++)

      {  MPI_Recv(&area, 1, MPI_DOUBLE, src, tag, MPI_COMM_WORLD, &status);

         total = total + area;

      }

      fprintf(stderr, "The area from %f to %f is: %f\n",

              lowerLimit, upperLimit, total );

   } 

   else                        /* all other processes only send */

   {  dest = 0; 

      MPI_Send(&area, 1, MPI_DOUBLE, dest, tag, MPI_COMM_WORLD);

   };

   

   /* finish */

   MPI_Finalize( );

   return 0;

}

This code is fairly straightforward, and you've already seen most of it. As before, we begin with the definition of the problem function and parameters. This is followed by the declaration of the variable that we'll need, first for the MPI calls and then for the problem. There are a few MPI variables whose use will be described shortly. Next comes the section that sets up MPI, but there is nothing new here.

The first real change comes in the next section where we adjust the problem size. In this example, we are calculating the area between 2 and 5. Since each process only needs to do part of this calculation, we need to divide the problem among the processes so that each process gets a different part (and all the parts are accounted for.) MPI_Comm_size is used to determine the number of parts the problem will be broken into, noProcesses. That is, we divide the total range (2 to 5) equally among the processes and adjust the start of the range for an individual process based on its rank. For example, with four processes, one process could calculate from 2 to 2.75, one from 2.75 to 3.5, one from 3.5 to 4.25, and one from 4.25 to 5.

In the next section of code, each process calculates the area for its part of the problem. This code keeps the number of rectangles fixed in this example rather than adjust it to the number of processes. That is, regardless of the number of processes used, each will use the same number of rectangles to solve its portion of the problem. Thus, if the number of processes increases from one run to the next, the answer won't come back any quicker but, up to a point, the answer should be more accurate. If your goal is speed, you could easily set the total number of rectangles for the problem, and then, in each process, divide that number by the number of processes or use some similar strategy.

Once we have completed this section, we need to collect and combine all our individual results. This is the new stuff. One process will act as a collector to which the remaining processes will send their results. Using the process with rank 0 as the receiver is the logical choice. The remaining processes act as senders. There is nothing magical about this choice apart from the fact that there will always be a process of rank 0. If a different process is selected, you'll need to ensure that a process with that rank exists. A fair amount of MPI code development can be done on a single processor system and then moved to a multiprocessor environment, so this isn't, as it might seem, a moot point.

The test (ProcessId = = 0) determines what will be done by the collector process and what will be done by all the remaining processes. The first branch following this test will be executed by the single process with a rank of 0. The second branch will be executed by each of the remaining processes. It is just this sort of test that allows us to write a single program that will execute correctly on each machine in the cluster with different processes doing different things.

13.3.2 Transferring Data

The defining characteristic of message passing is that the transfer of data from one process to another requires operations to be performed by both processes. This is handled by MPI_Send and MPI_Recv. The first branch after this test contains a loop that will execute once for each of the remaining nodes in the cluster. At each execution of the body of the loop, the rank 0 process collects information from one of the other processes. This is done with the call to MPI_Recv. Each of the other processes executes the second branch after the test once. Each process uses the call to MPI_Send to pass its results back to process 0. For example, for 100 processes, there are 99 calls to MPI_Send and 99 calls to MPI_Recv. (Process 0 already knows what it calculated.) Let's look at these two functions more closely.

13.3.2.1 MPI_Send

MPI_Send is used to send information from one process to another process.^[2] A call to MPI_Send must be matched with a corresponding call to MPI_Recv in the receiving process. Information is both typed and tagged. Typing is needed to support communications in a heterogeneous environment. The type information is used to insure that the necessary conversions to data representation are applied as data moves among machines in a transparent manner.

^[2] Actually, a process can send a message to itself, but this possibility can get tricky so we'll ignore it.

The first three arguments to MPI_Send, collectively, are used to specify the transmitted data. The first argument gives the address of the data, the second gives the number of items to be sent, and the third gives the data type. In this sample code, we are sending the area, a single double, so we specify MPI_DOUBLE as the type. In addition to MPI_DOUBLE, the other possible types are MPI_BYTE, MPI_CHAR, MPI_UNSIGNED_CHAR, MPI_SHORT, MPI_UNSIGNED_SHORT, MPI_INT, MPI_UNSIGNED_INT, MPI_LONG, MPI_UNSIGNED_LONG, MPI_LONG_DOUBLE, MPI_FLOAT, and MPI_PACKED.

The next argument is the destination. This is just the rank of the receiver. The destination is followed by a tag. Since MPI provides buffering, several messages can be outstanding. The tag is used to distinguish among multiple messages. This is a moot point in this example. MPI_COMM_WORLD is the default communicator, which has already been described.

13.3.2.2 MPI_Recv

The arguments to MPI_Recv are similar but include one addition, a status field. MPI_STATUS is a type definition for a structure that holds information about the actual message size, its source, and its tag. In C, the status variable is a structure composed of three fields-MPI_SOURCE, MPI_TAG, and MPI_ERROR-that contain the source, tag, and error code, respectively. With MPI_Recv, you can use a wildcard for either or both the source and the tag-MPI_ANY_SOURCE and MPI_ANY_TAG. The status field allows you to determine the actual source and tag in this situation.

You should be aware that MPI_Send and MPI_Recv are both blocking calls. For example, if you try to receive information that hasn't been sent, your process will be blocked or wait until it is sent before it can continue executing. While this is what you might expect, it can lead to nasty surprises if your code isn't properly written since you may have two processes waiting for each other.

Here is the output:

[sloanjd@amy sloanjd]$ mpicc mpi-rect.c -o mpi-rect

[sloanjd@amy sloanjd]$ mpirun -np 5 mpi-rect

The area from 2.000000 to 5.000000 is: 38.999964

Of course, all of this assumed you wanted to program in C. The next two sections provide alternatives to C.

13.3.3 MPI Using FORTRAN

Let's take a look at the same program written in FORTRAN.

        program main

        include "mpif.h"

   

        parameter (NORECS = 50, DLIMIT = 2.00, ULIMIT = 5.00)

        integer dst, err, i, noprocs, procid, src, tag

        integer status(MPI_STATUS_SIZE)

        double precision area, at, height, lower, width, total, range

   

        f(x) = x * x

   

********** MPI setup **********

        call MPI_INIT(err)

        call MPI_COMM_SIZE(MPI_COMM_WORLD, noprocs, err)

        call MPI_COMM_RANK(MPI_COMM_WORLD, procid, err)

    

********** adjust problem size for subproblem **********

        range = (ULIMIT - DLIMIT) / noprocs

        width = range / NORECS

        lower = DLIMIT + range * procid

   

********** calculate area for subproblem **********

        area = 0.0;

        do 10 i = 0, NORECS - 1

        at = lower + i * width + width / 2.0

        height = f(at)

        area = area + width * height

 10     continue

   

********** collect information and print results **********

        tag = 0

********** if rank is 0, collect results **********

        if (procid .eq. 0) then

           total = area

           do 20 src = 1, noprocs - 1

              call MPI_RECV(area, 1, MPI_DOUBLE_PRECISION, src, tag,

     +                      MPI_COMM_WORLD, status, err)

              total = total + area

 20        continue

           print '(1X, A, F5.2, A, F5.2, A, F8.5)', 'The area from ', 

     +             DLIMIT, ' to ', ULIMIT, ' is: ', total

        else

********** all other processes only send **********

           dest = 0; 

           call MPI_SEND(area, 1, MPI_DOUBLE_PRECISION, dest, tag,

     +                   MPI_COMM_WORLD, err)

        endif

   

********** finish ***********

        call MPI_FINALIZE(err)

        stop

        end

I'm assuming that, if you are reading this, you already know FORTRAN and that you have already read the C version of the code. So this discussion is limited to the differences between MPI in C and in FORTRAN. As you can see, there aren't many.

Don't forget to compile this with mpif77 rather than mpicc.

FORTRAN 77 programs begin with include "mpi.f". FORTRAN 90 may substitute use mpi if the MPI implementation supports modules.

In creating the MPI specification, a great deal of effort went into having similar binding in C and FORTRAN. The biggest difference is the way error codes are handled. In FORTRAN there are explicit parameters included as the last argument to each function call. This will return either MPI_SUCCESS or an implementation-defined error code.

In C, function arguments tend to be more strongly typed than in FORTRAN, and you will notice that C tends to use addresses when the function is returning a value. As you might expect, the parameters to MPI_Init have changed. Finally, MPI_STATUS is an array rather than a structure in FORTRAN.

Overall, the differences between C and FORTRAN aren't that great. You should have little difficulty translating code from one language to another.

13.3.4 MPI Using C++

Here is the same code in C++:

#include "mpi.h"

#include <stdio.h>

   

/* problem parameters */

#define f(x)            ((x) * (x))

#define numberRects     50

#define lowerLimit      2.0

#define upperLimit      5.0

   

int main( int argc, char * argv[  ] )

{

   /* MPI variables */

   int dest, noProcesses, processId, src, tag;

   MPI_Status status;

   

   /* problem variables */

   int         i;

   double      area, at, height, lower, width, total, range;

   

   /* MPI setup */

   MPI::Init(argc, argv);

   noProcesses = MPI::COMM_WORLD.Get_size( );

   processId = MPI::COMM_WORLD.Get_rank( );

   

   /* adjust problem size for subproblem*/

   range = (upperLimit - lowerLimit) / noProcesses;

   width = range / numberRects;

   lower = lowerLimit + range * processId;

   

   /* calculate area for subproblem */

   area = 0.0;

   for (i = 0; i < numberRects; i++)

   {  at = lower + i * width + width / 2.0;

      height = f(at);

      area = area + width * height;

   }

   

   /* collect information and print results */

   tag = 0;

   if (processId = = 0)         /* if rank is 0, collect results */

   {  total = area;

      for (src=1; src < noProcesses; src++)

      {  MPI::COMM_WORLD.Recv(&area, 1, MPI::DOUBLE, src, tag);

         total = total + area;

      }

      fprintf (stderr, "The area from %f to %f is: %f\n",

               lowerLimit, upperLimit, total );

   }

   else                        /* all other processes only send */

   {  dest = 0; 

      MPI::COMM_WORLD.Send(&area, 1, MPI::DOUBLE, dest, tag);

   };

   

   /* finish */

   MPI::Finalize( );

   return 0;

}

If you didn't skip the section on FORTRAN, you'll notice that there are more differences in going from C to C++ than in going from C to FORTRAN.

Remember, you'll compile this with mpiCC, not mpicc.

The C++ bindings were added to MPI as part of the MPI-2 effort. Rather than try to follow the binding structure used with C and FORTRAN, the C++ bindings were designed to exploit MPI's object-oriented structure. Consequently, most functions become members of C++ classes. For example, MPI::COM_WORLD is an instance of the communicator class. Get_rank and Get_size are methods in the class. All classes are part of the MPI namespace.

Another difference you'll notice is that Get_size and Get_rank return values. Since the usual style of error handling in C++ is throwing exceptions, which MPI follows, there is no need to return error codes.

Finally, you notice that the type specifications have changed. In this example, we see MPI::DOUBLE rather than MPI_DOUBLE which is consistent with the naming conventions being adopted here. We won't belabor this example. By looking at the code, you should have a pretty clear idea of how the bindings have changed with C++.

Now that we have a working solution, let's look at some ways it can be improved. Along the way we'll see two new MPI functions that can make life simpler.

Table of Contents