13.1 MPI
The major difficulty in parallel programming is subdividing
problems so that different parts can be executed simultaneously on
different machines. MPI is a library of routines that provides the
functionality needed to allow those parts to communicate. But it will
be up to you to determine how a problem can be broken into pieces so
that it can run on different machines.
The simplest approach is to have the number of processes match the
number of machines or processors that are available. However, this is
not required. If you have a small problem that can be easily run on a
subset of your cluster, or if your problem logically decomposes in
such a way that you don't need the entire cluster,
then you can (and should) execute the program on fewer machines. It
is also possible to have multiple processes running on the same
machine. This is particularly common when developing code. In this
case, the operating system will switch between processes as needed.
You won't benefit from the parallelization of the
code, but the job will still complete correctly.
13.1.1 Core MPI
With most parallelizable problems,
programs running on multiple computers do the bulk of the work and
then communicate their individual results to a single computer that
collects these intermediate results, combines them, and reports the
final results. It is certainly possible to write a different program
for each machine in the cluster, but from a software management
perspective, it is much easier if we can write just one program. As
the program executes on each machine, it will first determine which
computer it is running on and, based on that information, tackle the
appropriate part of the original problem. When the computation is
complete, one machine will act as a receiver and all the other
machines will send their results to it.
For this approach to work, each executing program or process must be
able to differentiate itself from other processes.
Let's look at a very basic example that demonstrates
how processes, i.e., the program in execution on different computers,
are able to differentiate themselves. While this example
doesn't accomplish anything particularly useful, it
shows how the pieces fit together. It introduces four key functions
and one other useful function. And with a few minor changes, this
program will serve as a template for future programs.
#include "mpi.h"
#include <stdio.h>
int main( int argc, char * argv[ ] )
{
int processId; /* rank of process */
int noProcesses; /* number of processes */
int nameSize; /* length of name */
char computerName[MPI_MAX_PROCESSOR_NAME];
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &noProcesses);
MPI_Comm_rank(MPI_COMM_WORLD, &processId);
MPI_Get_processor_name(computerName, &nameSize);
fprintf(stderr,"Hello from process %d on %s\n", processId, computerName);
MPI_Finalize( );
return 0;
}
This example introduces five MPI functions, defined through the
inclusion of the header file for the MPI library,
mpi.h, and included when the MPI library is
linked to the program. While this example uses C, similar libraries
are available for C++ and FORTRAN.
Four of these functions, MPI_Init,
MPI_Comm_size, MPI_Comm_rank,
and MPI_Finalize, are seen in virtually every MPI
program. We will look at each in turn. (Notice that all MPI
identifiers begin with MPI_.)
13.1.1.1 MPI_Init
MPI_Init is
used to initialize an MPI session. All MPI programs must have a call
to MPI_Init. MPI_Init is called
once, typically at the start of a program. You can have lots of other
code before this call, or you can even call
MPI_Init from a subroutine, but you should call it
before any other MPI functions are called. (There is an exception:
the function MPI_Initialized can be called before
MPI_Init. MPI_Initialized is
used to see if MPI_Init has been previously
called.)
In C, MPI_Init can be called with the
addresses for argc and
argv as shown in the example. This allows the
program to take advantage of command-line arguments. Alternatively,
these addresses can be replaced with a NULL.
13.1.1.2 MPI_Finalize
MPI_Finalize is
called to shut down MPI. MPI_Finalize should be
the last MPI call made in a program. It is used to free memory, etc.
It is the user's responsibility to ensure that all
pending communications are complete before a process calls
MPI_Finalize. You must write your code so that
every process calls MPI_Finalize. Notice that
there are no arguments.
13.1.1.3 MPI_Comm_size
This
routine is used to determine the total number of processes running in
a communicator (the communications group for the
processes being used). It takes the communicator as the first
argument and the address of an integer variable used to return the
number of processes. For example, if you are executing a program
using five processes and the default communicator, the value returned
by MPI_Comm_size will be five, the total number of
processes being used. This is number of processes, but not
necessarily the number of machines being used.
In the example, both MPI_Comm_size and
MPI_Comm_rank used the default communicator,
MPI_COMM_WORLD. This communicator includes all the
processes available at initialization and is created automatically
for you. Communicators are used to distinguish and group messages. As
such, communicators provide a powerful encapsulation mechanism. While
it is possible to create and manipulate your own communicators, the
default communicator will probably satisfy most of your initial
needs.
13.1.1.4 MPI_Comm_rank
MPI_Comm_rank
is used to determine the rank of the current process within the
communicator. MPI_Comm_rank takes a communicator
as its first argument and the address of an integer variable is used
to return the value of the rank.
Basically, each process is assigned a different process number or
rank within a communicator. Ranks range from 0 to one less than the
size returned by MPI_Comm_size. For example, if
you are running a set of five processes, the individual processes
will be numbered 0, 1, 2, 3, and 4. By examining its rank, a process
can distinguish itself from other processes.
The values returned by MPI_Comm_size and
MPI_Comm_rank are often used to divvy up a problem
among processes. For example, suppose that for some problem you want
to divide the work among five processors. This is a decision you make
when you run your program; your choice is not coded into the program
since it may not be known when the program is written. Once the
program is running, it can call MPI_Comm_size to
determine the number of processes attacking the problem. In this
example, it would return five. Each of the five processes now knows
that it needs to solve one fifth of the original problem (assuming
you've written the code this way).
Next, each individual process can examine its rank to determine its
role in the calculation. Continuing with the current example, each
process needs to decide which fifth of the original problem to work
on. This is where MPI_Comm_rank comes in. Since
each process has a different rank, it can use its rank to select its
role. For example, the process with rank 0 might work on the first
part of the problem; the process with rank 1 will work on the second
part of the problem, etc.
Of course, you can divide up the problem differently if you like. For
example, the process with rank 0 might collect all the results from
the other processes for the final report rather than participate in
the actual calculation. Or each process could use its rank as an
index to an array to discover what parameters to use in a
calculation. It is really up to you as a programmer to determine how
you want to use this information.
13.1.1.5 MPI_Get_processor_name
MPI_Get_processor_name
is used to retrieve the host name of the node on which the individual
process is running. In the sample program, we used it to display host
names. The first argument is an array to store the name and the
second is used to return the actual length of the name.
MPI_Get_processor_name is a nice function to have
around, particularly when you want to debug code, but otherwise it
isn't used all that much. The first four MPI
functions, however, are core functions and will be used in virtually
every MPI program you'll write. If you drop the
relevant declarations, the call to
MPI_Get_processor_name, and the
fprintf, you'll have a template
that you can use when writing MPI programs.
Although we haven't used it, each of the
C versions of these five
functions returns an integer error code. With a few exceptions, the
actual code is left up to the implementers. Error codes can be
translated into meaningful messages using the
MPI_Error_string function. In order to keep the
code as simple as possible, this book has adopted the (questionable)
convention of ignoring the returned error codes.
Here is an example of compiling and running the code:
[sloanjd@amy sloanjd]$ mpicc hello.c -o hello
[sloanjd@amy sloanjd]$ mpirun -np 5 hello
Hello from process 0 on amy
Hello from process 2 on oscarnode2.oscardomain
Hello from process 1 on oscarnode1.oscardomain
Hello from process 4 on oscarnode4.oscardomain
Hello from process 3 on oscarnode3.oscardomain
There are a couple of things to observe with this example. First,
notice that there is no apparent order in the output. This will
depend on the speed of the individual machines, the loads on the
machines, and the speeds of the communications links. Unless you take
explicit measures to control the order of execution among processors,
you should make no assumptions about the order of execution.
Second, the role of MPI_Comm_size should now be
clearer. When running the program, the user specifies the number of
processes on the command line. MPI_Comm_size
provided a way to get that information back into the program. Next
time, if you want to use a different number of processes, just change
the command line and your code will take care of the rest.
|