13.4 I/O with MPI
One severe limitation to our
solution is that all of the parameters are hardwired into the
program. If we want to change anything, we need to recompile the
program. It would be much more useful if we read parameters from
standard input.
Thus far, we have glossed over the potential difficulties that arise
with I/O and MPI. In general, I/O can get very messy with parallel
programs. With our very first program, we saw messages from each
processor on our screen. Stop and think about it-how did the
messages from the other remote processes get to our screen? That bit
of magic was handled by mpirun. The MPI standard
does not fully specify how I/O should be handled. Details are left to
the implementer. In general, you can usually expect the rank 0
process to be able to both read from standard input and write to
standard output. Output from other processes is usually mapped back
to the home node and displayed. Input calls by other processes are
usually mapped to /dev/zero, i.e., they are
ignored. If in doubt, consult the documentation for your particular
implementation. If you can't find the answer in the
documentation, it is fairly straightforward to write a simple test
program.
In practice, this strategy doesn't cause too many
problems. It is certainly adequate for our modest goals. Our strategy
is to have the rank 0 process read the parameters from standard input
and then distribute them to the remaining processes. With that in
mind, here is a solution. New code appears in boldface.
#include "mpi.h"
#include <stdio.h>
/* problem parameters */
#define f(x) ((x) * (x))
int main( int argc, char * argv[ ] )
{
/* MPI variables */
int dest, noProcesses, processId, src, tag;
MPI_Status status;
/* problem variables */
int i, numberRects;
double area, at, height, lower, width, total, range;
double lowerLimit, upperLimit;
/* MPI setup */
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &noProcesses);
MPI_Comm_rank(MPI_COMM_WORLD, &processId);
tag = 0;
if (processId = = 0) /* if rank is 0, collect parameters */
{
fprintf(stderr, "Enter number of steps:\n");
scanf("%d", &numberRects);
fprintf(stderr, "Enter low end of interval:\n");
scanf("%lf", &lowerLimit);
fprintf(stderr, "Enter high end of interval:\n");
scanf("%lf", &upperLimit);
for (dest=1; dest < noProcesses; dest++) /* distribute parameters */
{
MPI_Send(&numberRects, 1, MPI_INT, dest, 0, MPI_COMM_WORLD);
MPI_Send(&lowerLimit, 1, MPI_DOUBLE, dest, 1, MPI_COMM_WORLD);
MPI_Send(&upperLimit, 1, MPI_DOUBLE, dest, 2, MPI_COMM_WORLD);
}
}
else /* all other processes receive */
{ src = 0;
MPI_Recv(&numberRects, 1, MPI_INT, src, 0, MPI_COMM_WORLD, &status);
MPI_Recv(&lowerLimit, 1, MPI_DOUBLE, src, 1, MPI_COMM_WORLD, &status);
MPI_Recv(&upperLimit, 1, MPI_DOUBLE, src, 2, MPI_COMM_WORLD, &status);
}
/* adjust problem size for subproblem */
range = (upperLimit - lowerLimit) / noProcesses;
width = range / numberRects;
lower = lowerLimit + range * processId;
/* calculate area for subproblem */
area = 0.0;
for (i = 0; i < numberRects; i++)
{ at = lower + i * width + width / 2.0;
height = f(at);
area = area + width * height;
}
/* collect information and print results */
tag = 3;
if (processId = = 0) /* if rank is 0, collect results */
{ total = area;
fprintf(stderr, "Area for process 0 is: %f\n", area);
for (src=1; src < noProcesses; src++)
{
MPI_Recv(&area, 1, MPI_DOUBLE, src, tag, MPI_COMM_WORLD, &status);
fprintf(stderr, "Area for process %d is: %f\n", src, area);
total = total + area;
}
fprintf (stderr, "The area from %f to %f is: %f\n",
lowerLimit, upperLimit, total );
}
else /* all other processes only send */
{ dest = 0;
MPI_Send(&area, 1, MPI_DOUBLE, dest, tag, MPI_COMM_WORLD);
}
/* finish */
MPI_Finalize( );
return 0;
}
The solution is straightforward. We need to partition the problem so
that the input is only attempted by the rank 0 process. It then
enters a loop to send the parameters to the remaining processes.
While this approach certainly works, it introduces a lot of overhead.
While it might be tempting to calculate a few of the derived
parameters (e.g. range or
width) and distribute them as well, this is a
false economy. Communication is always costly, so
we'll let each process calculate these values for
themselves. Anyway, they would have been idle while the rank 0
process did the calculations.
|