17.4 Timing C Code Segments
The primary limitation to the various
versions of time is that they
don't tell you what part of your code is running
slowly. To know more, you'll need to delve into your
code. There are a couple of ways this can be done. The most
straightforward way is to
"instrument" the code-that
is, to embed commands directly into the code that record the system
time at key points and then to use these individual times to
calculate elapsed times.
The primary advantage to manual instrumentation of code is total
control. You determine exactly what you want or need. This control
doesn't come cheap. There are several difficulties
with manual instrumentation. First and foremost, it is a lot of work.
You'll need to add variables, determine collection
points, calculate elapsed times, and format and display the results.
Typically, it will take several passes to locate the portion of code
that is of interest. For a large program, you may have a number of
small, critical sections that you need to look at. Once you have
these timing values, you'll need to figure out how
to interpret them. You'll also need to guard against
altering the performance of your program. This can be a result of
over-instrumenting your code, particularly at critical points. Of
course, these problems are not specific to manual instrumentation and
will exist to some extent with whatever approach you take.
The traditional way of instrumenting C code is with the
time system call,
provided by the time.h library. Here is a code
fragment that demonstrates its use:
...
#include <sys/time.h>
int main(void)
{
time_t start, finish;
...
time(&start);
/* section to be timed */
...
time(&finish);
printf("Elapsed time: %d\n", finish - start);
...
}
The time function returns the number of seconds since midnight (GMT)
January 1, 1970. Since this is a very large integer, the type
time_t (defined in
<sys/times.h>) can be used to ensure that
time variables have adequate storage. While easy to use if it meets
your needs, the primary limitation for time is
that the granularity (1 second) is too large for many tasks.
You can get around the granularity problem by using a different
function,
gettimeofday, which provides microsecond
granularity. gettimeofday is used with a structure
composed of two long integers, one for seconds and one for
microseconds. Its use is slightly more complicated. Here is an
example:
...
#include <sys/time.h>
int main(void)
{
struct timeval start, finish;
struct timezone tz;
...
gettimeofday(&start, &tz);
printf("Time---seconds: %d microseconds: %d \n",
start.tv_sec, start.tv_usec);
/* section to be timed */
...
gettimeofday(&finish, &tz);
printf("Time---seconds: %d microseconds: %d \n",
finish.tv_sec, finish.tv_usec);
printf("\nElapsed time---seconds: %d microseconds: %d \n",
((start.tv_usec > finish.tv_usec) ?
finish.tv_sec - start.tv_sec - 1 :
finish.tv_sec - start.tv_sec),
(start.tv_usec > finish.tv_usec) ?
1000000 + finish.tv_usec - start.tv_usec :
finish.tv_usec - start.tv_usec);
return 0;
}
The first argument to gettimeofday is the
structure for the time. The second is used to adjust results for the
appropriate time zone. Since we are interested in elapsed time, the
time zone is treated as a dummy argument. The first two
printfs in this example show how to display the
individual counters. The last printf displays the
elapsed time. Because two numbers are involved, calculating elapsed
time is slightly more complicated than with time.
Keep in mind that both time and
gettimeofday return wall-clock times. If the
process is interrupted between calls, the elapsed time that you
calculate will include the time spent during the interruption, even
if it has absolutely nothing to do with your program. On the other
hand, these functions should largely (but not completely) be immune
to problems caused with code reordering for compiler optimizations,
provided you stick to timing basic blocks. (A basic block is a block
of contiguous code that has a single entry point at its beginning and
a single exit point at its end).
Typically, timing commands are placed inside
#ifdef statements so that they can be compiled
only as needed. Other languages, such as FORTRAN, have similar timing
commands. However, what's available varies from
compiler to compiler, so be sure to check the appropriate
documentation for your compiler.
17.4.1 Manual Timing with MPI
With the C library routines
time and gettimeofday, you have
to choose between poor granularity and the complications of dealing
with a structure. With parallel programs, there is the additional
problem of synchronizing processes across the cluster.
17.4.2 MPI Functions
MPI provides another alternative, three additional functions that can
be used to time code.
17.4.2.1 MPI_Wtime
Like time, the
function MPI_Wtime returns the number of seconds
since some point in the past. Although this point in the past is not
specified by the standard, it is guaranteed not to change during the
life of a process. However, there are no guarantees of consistency
among different processes across the cluster. The function call takes
no arguments. Since the return value is a double, the function can
provide a finer granularity than time. As with
time, MPI_Wtime returns the
wall-clock time. If the process is interrupted between calls to
MPI_Wtime, the time the process is idle will be
included in your calculated elapsed time. Since
MPI_Wtime returns the time, unlike most MPI
functions, it cannot return an error code.
17.4.2.2 MPI_Wtick
MPI_Wtick
returns the actual resolution or granularity for the time returned by
MPI_Wtime (rather than an error code). For
example, if the system clock is a counter that is incremented every
microsecond, MPI_Wtick will return a value of
roughly 0.000001. It takes no argument and returns a double. In
practice, MPI_Wtick's primary use
is to satisfy the user's curiosity.
17.4.2.3 MPI_Barrier
One problem with timing a parallel
program is that one process may be idle while waiting for another. If
used naively, MPI_Wtime, time,
or gettimeofday could return a value that includes
both running code and idle time. While you'll
certainly want to know about both of these, it is likely that the
information will be useful only if you can separate them.
MPI_Barrier can be used to synchronize processes
within a communication group. When MPI_Barrier is
called, individual processes in the group are blocked until all the
processes have entered the call. Once all processes have entered the
call, i.e., reached the same point in the code, the call returns for
each process, and the processes are no longer blocked.
MPI_Barrier takes a communicator as an argument
and, like most MPI functions, returns an integer error code.
Here is a code fragment that demonstrates how these functions might
be used:
...
#include "mpi.h"
...
int main( int argc, char * argv[ ] )
{
double start, finish;
...
MPI_Barrier(MPI_COMM_WORLD);
start = MPI_Wtime( );
/* section to be timed */
...
MPI_Barrier(MPI_COMM_WORLD);
finish = MPI_Wtime( );
if (processId = = 0)
fprintf(stderr, "Elapsed time: %f\n", finish-start);
...
}
Depending on the other code in the program, one or both of the calls
to MPI_Barrier may not be essential. Also, when
timing short code segments, you shouldn't overlook
the cost of measurement. If needed, you can write a short code
segment to estimate the cost of the calls to
MPI_Barrier and MPI_Wtime by
simply repeating the calls and calculating the difference.
17.4.3 PMPI
If you want to time MPI calls, MPI
provides a wrapper mechanism that can be used to create profiling
interface. Each MPI function has a dual function whose name begins
with PMPI rather than MPI. For
instance, you can use PMPI_Send just as you would
MPI_Send, PMPI_Recv just as you
would MPI_Recv, and so on. What this allows you to
do is write your own version of any function and still have a way to
call the original function. For example, if you want to write your
own version of MPI_Send, you'll
still be able to call the original version by simply calling its
dual, PMPI_Send. Of course, to get this to work,
you'll need to link to your library of customized
functions before you link to the standard MPI library.
Interesting, you say, but how is this useful? For profiling MPI
commands, you can write a new version of any MPI function that calls
a timing routine, then calls the original version, and, finally,
calls the timing routine again when the original function returns.
Here is an example for MPI_Send:
int MPI_Send(void * buf, int count, MPI_Datatype datatype, int dest,
int tag, MPI_Comm comm)
{
double start, finish;
int err_code;
start = MPI_Wtime( );
err_code = PMPI_Send(buf, count, datatype, dest, tag, comm);
finish = MPI_Wtime( );
fprintf(stderr, "Elapsed time: %f\n", finish - start);
return err_code;
}
For this function definition, the parameter list was copied from the
MPI standard. For the embedded MPI function call, the return code
from the call to PMPI_Send is saved and passed
back to the calling program. This example just displays the elapsed
time. An alternative would be to return it through a global variable
or write it out to a file.
To use this code, you need to ensure that it is compiled and linked
before the MPI library is linked into your program. One neat thing
about this approach is that you'll be able to use it
with precompiled modules even if you don't have
their source. Of course it is a lot of work to create routines for
every MPI routine, but we'll see an alternative when
we look at profiling using MPE later in this chapter.
|