17.4 Timing C Code Segments

The primary limitation to the various versions of time is that they don't tell you what part of your code is running slowly. To know more, you'll need to delve into your code. There are a couple of ways this can be done. The most straightforward way is to "instrument" the code-that is, to embed commands directly into the code that record the system time at key points and then to use these individual times to calculate elapsed times.

The primary advantage to manual instrumentation of code is total control. You determine exactly what you want or need. This control doesn't come cheap. There are several difficulties with manual instrumentation. First and foremost, it is a lot of work. You'll need to add variables, determine collection points, calculate elapsed times, and format and display the results. Typically, it will take several passes to locate the portion of code that is of interest. For a large program, you may have a number of small, critical sections that you need to look at. Once you have these timing values, you'll need to figure out how to interpret them. You'll also need to guard against altering the performance of your program. This can be a result of over-instrumenting your code, particularly at critical points. Of course, these problems are not specific to manual instrumentation and will exist to some extent with whatever approach you take.

The traditional way of instrumenting C code is with the time system call, provided by the time.h library. Here is a code fragment that demonstrates its use:

...

#include <sys/time.h>

   

int main(void)

{

   time_t start, finish;

   ...

   time(&start);

   /* section to be timed */

   ...

   time(&finish);

   printf("Elapsed time: %d\n", finish - start);

   ...

}

The time function returns the number of seconds since midnight (GMT) January 1, 1970. Since this is a very large integer, the type time_t (defined in <sys/times.h>) can be used to ensure that time variables have adequate storage. While easy to use if it meets your needs, the primary limitation for time is that the granularity (1 second) is too large for many tasks.

You can get around the granularity problem by using a different function, gettimeofday, which provides microsecond granularity. gettimeofday is used with a structure composed of two long integers, one for seconds and one for microseconds. Its use is slightly more complicated. Here is an example:

...

#include <sys/time.h>

   

int main(void)

{

   struct timeval start, finish;

   struct timezone tz;

   ...

   gettimeofday(&start, &tz);

   printf("Time---seconds: %d   microseconds: %d \n", 

          start.tv_sec, start.tv_usec);

   /* section to be timed */

   ...

   gettimeofday(&finish, &tz); 

   printf("Time---seconds: %d   microseconds: %d \n", 

          finish.tv_sec, finish.tv_usec);

   

   printf("\nElapsed time---seconds: %d   microseconds: %d \n",

          ((start.tv_usec > finish.tv_usec) ? 

           finish.tv_sec - start.tv_sec - 1 :

           finish.tv_sec - start.tv_sec),

           (start.tv_usec > finish.tv_usec) ?

           1000000 + finish.tv_usec - start.tv_usec :

           finish.tv_usec - start.tv_usec);

   

   return 0;

   

}

The first argument to gettimeofday is the structure for the time. The second is used to adjust results for the appropriate time zone. Since we are interested in elapsed time, the time zone is treated as a dummy argument. The first two printfs in this example show how to display the individual counters. The last printf displays the elapsed time. Because two numbers are involved, calculating elapsed time is slightly more complicated than with time.

Keep in mind that both time and gettimeofday return wall-clock times. If the process is interrupted between calls, the elapsed time that you calculate will include the time spent during the interruption, even if it has absolutely nothing to do with your program. On the other hand, these functions should largely (but not completely) be immune to problems caused with code reordering for compiler optimizations, provided you stick to timing basic blocks. (A basic block is a block of contiguous code that has a single entry point at its beginning and a single exit point at its end).

Typically, timing commands are placed inside #ifdef statements so that they can be compiled only as needed. Other languages, such as FORTRAN, have similar timing commands. However, what's available varies from compiler to compiler, so be sure to check the appropriate documentation for your compiler.

17.4.1 Manual Timing with MPI

With the C library routines time and gettimeofday, you have to choose between poor granularity and the complications of dealing with a structure. With parallel programs, there is the additional problem of synchronizing processes across the cluster.

17.4.2 MPI Functions

MPI provides another alternative, three additional functions that can be used to time code.

17.4.2.1 MPI_Wtime

Like time, the function MPI_Wtime returns the number of seconds since some point in the past. Although this point in the past is not specified by the standard, it is guaranteed not to change during the life of a process. However, there are no guarantees of consistency among different processes across the cluster. The function call takes no arguments. Since the return value is a double, the function can provide a finer granularity than time. As with time, MPI_Wtime returns the wall-clock time. If the process is interrupted between calls to MPI_Wtime, the time the process is idle will be included in your calculated elapsed time. Since MPI_Wtime returns the time, unlike most MPI functions, it cannot return an error code.

17.4.2.2 MPI_Wtick

MPI_Wtick returns the actual resolution or granularity for the time returned by MPI_Wtime (rather than an error code). For example, if the system clock is a counter that is incremented every microsecond, MPI_Wtick will return a value of roughly 0.000001. It takes no argument and returns a double. In practice, MPI_Wtick's primary use is to satisfy the user's curiosity.

17.4.2.3 MPI_Barrier

One problem with timing a parallel program is that one process may be idle while waiting for another. If used naively, MPI_Wtime, time, or gettimeofday could return a value that includes both running code and idle time. While you'll certainly want to know about both of these, it is likely that the information will be useful only if you can separate them. MPI_Barrier can be used to synchronize processes within a communication group. When MPI_Barrier is called, individual processes in the group are blocked until all the processes have entered the call. Once all processes have entered the call, i.e., reached the same point in the code, the call returns for each process, and the processes are no longer blocked. MPI_Barrier takes a communicator as an argument and, like most MPI functions, returns an integer error code.

Here is a code fragment that demonstrates how these functions might be used:

...

#include "mpi.h"

...

int main( int argc, char * argv[  ] )

{

   double      start, finish;

   ...

   MPI_Barrier(MPI_COMM_WORLD);

   start = MPI_Wtime( );

   /* section to be timed */

   ...

   MPI_Barrier(MPI_COMM_WORLD);

   finish = MPI_Wtime( );

   if (processId = = 0)

      fprintf(stderr, "Elapsed time: %f\n", finish-start);

   ...

}

Depending on the other code in the program, one or both of the calls to MPI_Barrier may not be essential. Also, when timing short code segments, you shouldn't overlook the cost of measurement. If needed, you can write a short code segment to estimate the cost of the calls to MPI_Barrier and MPI_Wtime by simply repeating the calls and calculating the difference.

17.4.3 PMPI

If you want to time MPI calls, MPI provides a wrapper mechanism that can be used to create profiling interface. Each MPI function has a dual function whose name begins with PMPI rather than MPI. For instance, you can use PMPI_Send just as you would MPI_Send, PMPI_Recv just as you would MPI_Recv, and so on. What this allows you to do is write your own version of any function and still have a way to call the original function. For example, if you want to write your own version of MPI_Send, you'll still be able to call the original version by simply calling its dual, PMPI_Send. Of course, to get this to work, you'll need to link to your library of customized functions before you link to the standard MPI library.

Interesting, you say, but how is this useful? For profiling MPI commands, you can write a new version of any MPI function that calls a timing routine, then calls the original version, and, finally, calls the timing routine again when the original function returns. Here is an example for MPI_Send:

int MPI_Send(void * buf, int count, MPI_Datatype datatype, int dest, 

             int tag, MPI_Comm comm)

{

   double start, finish;

   int err_code;

   

   start = MPI_Wtime( );

   err_code = PMPI_Send(buf, count, datatype, dest, tag, comm);

   finish = MPI_Wtime( );

   fprintf(stderr, "Elapsed time: %f\n", finish - start);

   

   return err_code;

}

For this function definition, the parameter list was copied from the MPI standard. For the embedded MPI function call, the return code from the call to PMPI_Send is saved and passed back to the calling program. This example just displays the elapsed time. An alternative would be to return it through a global variable or write it out to a file.

To use this code, you need to ensure that it is compiled and linked before the MPI library is linked into your program. One neat thing about this approach is that you'll be able to use it with precompiled modules even if you don't have their source. Of course it is a lot of work to create routines for every MPI routine, but we'll see an alternative when we look at profiling using MPE later in this chapter.

Table of Contents