Table of Contents Previous Section Next Section

5.5 Minimum Execution Time and Minimum Cost-Optimal Execution Time

We are often interested in knowing how fast a problem can be solved, or what the minimum possible execution time of a parallel algorithm is, provided that the number of processing elements is not a constraint. As we increase the number of processing elements for a given problem size, either the parallel runtime continues to decrease and asymptotically approaches a minimum value, or it starts rising after attaining a minimum value (Problem 5.12). We can determine the minimum parallel runtime graphics/01icon23.gif for a given W by differentiating the expression for TP with respect to p and equating the derivative to zero (assuming that the function TP (W, p) is differentiable with respect to p). The number of processing elements for which TP is minimum is determined by the following equation:

Equation 5.21

graphics/05fig35.gif


Let p0 be the value of the number of processing elements that satisfies Equation 5.21. The value of graphics/01icon23.gif can be determined by substituting p0 for p in the expression for TP . In the following example, we derive the expression for graphics/01icon23.gif for the problem of adding n numbers.

Example 5.18 Minimum execution time for adding n numbers

Under the assumptions of Example 5.12, the parallel run time for the problem of adding n numbers on p processing elements can be approximated by

Equation 5.22

graphics/05fig36.gif


Equating the derivative with respect to p of the right-hand side of Equation 5.22 to zero we get the solutions for p as follows:

Equation 5.23

graphics/05fig37.gif


Substituting p = n/2 in Equation 5.22, we get

Equation 5.24

graphics/05fig38.gif



In Example 5.18, the processor-time product for p = p0 is Q(n log n), which is higher than the Q(n) serial complexity of the problem. Hence, the parallel system is not cost-optimal for the value of p that yields minimum parallel runtime. We now derive an important result that gives a lower bound on parallel runtime if the problem is solved cost-optimally.

Let graphics/01icon09.gif be the minimum time in which a problem can be solved by a cost-optimal parallel system. From the discussion regarding the equivalence of cost-optimality and the isoefficiency function in Section 5.4.3, we conclude that if the isoefficiency function of a parallel system is Q(f(p)), then a problem of size W can be solved cost-optimally if and only if W = W(f(p)). In other words, given a problem of size W, a cost-optimal solution requires that p = O(f-1(W)). Since the parallel runtime is Q(W/p) for a cost-optimal parallel system (Equation 5.18), the lower bound on the parallel runtime for solving a problem of size W cost-optimally is

Equation 5.25

graphics/05fig39.gif


Example 5.19 Minimum cost-optimal execution time for adding n numbers

As derived in Example 5.14, the isoefficiency function f(p) of this parallel system is Q(p log p). If W = n = f(p) = p log p, then log n = log p + log log p. Ignoring the double logarithmic term, log n log p. If n = f (p) = p log p, then p = f-1(n) = n/log p n/log n. Hence, f-1(W) = Q(n/log n). As a consequence of the relation between cost-optimality and the isoefficiency function, the maximum number of processing elements that can be used to solve this problem cost-optimally is Q(n/log n). Using p = n/log n in Equation 5.2, we get

Equation 5.26

graphics/05fig40.gif



It is interesting to observe that both graphics/01icon23.gif and graphics/01icon09.gif for adding n numbers are Q(log n) (Equations 5.24 and 5.26). Thus, for this problem, a cost-optimal solution is also the asymptotically fastest solution. The parallel execution time cannot be reduced asymptotically by using a value of p greater than that suggested by the isoefficiency function for a given problem size (due to the equivalence between cost-optimality and the isoefficiency function). This is not true for parallel systems in general, however, and it is quite possible that graphics/01icon10.gif. The following example illustrates such a parallel system.

Example 5.20 A parallel system with

Consider the hypothetical parallel system of Example 5.15, for which

Equation 5.27

graphics/05fig41.gif


From Equation 5.10, the parallel runtime for this system is

Equation 5.28

graphics/05fig42.gif


Using the methodology of Example 5.18,

graphics/05fig43.gif


From the preceding analysis, p0 = Q(W). Substituting p by the value of p0 in Equation 5.28, we get

Equation 5.29

graphics/05fig44.gif


According to Example 5.15, the overall isoefficiency function for this parallel system is Q(p3), which implies that the maximum number of processing elements that can be used cost-optimally is Q(W1/3). Substituting p = Q(W1/3) in Equation 5.28, we get

Equation 5.30

graphics/05fig45.gif


A comparison of Equations 5.29 and 5.30 shows that graphics/01icon09.gif is asymptotically greater than graphics/01icon23.gif.


In this section, we have seen examples of both types of parallel systems: those for which graphics/01icon09.gif is asymptotically equal to graphics/01icon23.gif, and those for which graphics/01icon09.gif is asymptotically greater than graphics/01icon23.gif. Most parallel systems presented in this book are of the first type. Parallel systems for which the runtime can be reduced by an order of magnitude by using an asymptotically higher number of processing elements than indicated by the isoefficiency function are rare.

While deriving the minimum execution time for any parallel system, it is important to be aware that the maximum number of processing elements that can be utilized is bounded by the degree of concurrency C(W) of the parallel algorithm. It is quite possible that p0 is greater than C(W) for a parallel system (Problems 5.13 and 5.14). In such cases, the value of p0 is meaningless, and graphics/01icon23.gif is given by

Equation 5.31

graphics/05fig46.gif


    Table of Contents Previous Section Next Section