Bibliography

[ABJ82] S. G. Akl, D. T. Bernard, and R. J. Jordan. Design and implementation of a parallel tree search algorithm. IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-4:192-203, 1982.

[ACM91] ACM. Resources in Parallel and Concurrent Systems. ACM Press, New York, NY, 1991.

[ACS89a] A. Aggarwal, A. K. Chandra, and M. Snir. A model for hierarchical memory. Technical Report RC 15118 (No. 67337), IBM T. J. Watson Research Center, Yorktown Heights, NY, 1989.

[ACS89b] A. Aggarwal, A. K. Chandra, and M. Snir. On communication latency in PRAM computations. Technical Report RC 14973 (No. 66882), IBM T. J. Watson Research Center, Yorktown Heights, NY, 1989.

[ACS89c] A. Aggarwal, A. K. Chandra, and M. Snir. Communication complexity of PRAMs. Technical Report RC 14998 (64644), IBM T. J. Watson Research Center, Yorktown Heights, NY, 1989.

[ADJ⁺91] A. Agarwal, G. D'Souza, K. Johnson, D. Kranz, J. Kubiatowicz, K. Kurihara, B.-H. Lim, G. Maa, D. Nussbaum, M. Parkin, and D. Yeung. The MIT alewife machine : A large-scale distributed-memory multiprocessor. In Proceedings of Workshop on Scalable Shared Memory Multiprocessors. Kluwer Academic, 1991.

[AFKW90] I. Angus, G. C. Fox, J. Kim, and D. W. Walker. Solving Problems on Concurrent Processors: Software for Concurrent Processors: Volume II. Prentice-Hall, Englewood Cliffs, NJ, 1990.

[AG94] G. S. Almasi and A. Gottlieb. Highly Parallel Computing. Benjamin/Cummings, Redwood City, CA, 1994. (Second Edition).

[Aga89] A. Agarwal. Performance tradeoffs in multithreaded processors. Technical Report 89-566, Massachusetts Institute of Technology, Microsystems Program Office, Cambridge, MA, 1989.

[Aga91] A. Agarwal. Performance tradeoffs in multithreaded processors. Technical report MIT/LCS/TR 501; VLSI memo no. 89-566, Laboratory for Computer Science, Massachusetts Institute of Technology, Cambridge, MA, 1991.

[AGGM90] A. Averbuch, E. Gabber, B. Gordissky, and Y. Medan. A parallel FFT on an MIMD machine. Parallel Computing, 15:61-74, 1990.

[Agh86] G. Agha. Actors: A Model of Concurrent Computation in Distributed Systems. MIT Press, Cambridge, MA, 1986.

[AHMP87] H. Alt, T. Hagerup, K. Mehlhorn, and F. P. Preparata. Deterministic simulation of idealized parallel computers on more realistic ones. SIAM Journal of Computing, 16(5):808-835, October 1987.

[AHU74] A. V. Aho, J. E. Hopcroft, and J. D. Ullman. The Design and Analysis of Computer Algorithms. Addison-Wesley, Reading, MA, 1974.

[AJM88] D. P. Agrawal, V. K. Janakiram, and R. Mehrotra. A randomized parallel branch-and-bound algorithm. In Proceedings of the 1988 International Conference on Parallel Processing, 1988.

[AK84] M. J. Atallah and S. R. Kosaraju. Graph problems on a mesh-connected processor array. Journal of ACM, 31(3):649-667, July 1984.

[Akl85] S. G. Akl. Parallel Sorting Algorithms. Academic Press, San Diego, CA, 1985.

[Akl89] S. G. Akl. The Design and Analysis of Parallel Algorithms. Prentice-Hall, Englewood Cliffs, NJ, 1989.

[Akl97] S. G. Akl. Parallel Computation Models and Methods. Prentice-Hall, Englewood Cliffs, NJ, 1997.

[AKR89] S. Arvindam, V. Kumar, and V. N. Rao. Floorplan optimization on multiprocessors. In Proceedings of the 1989 International Conference on Computer Design, 1989. Also published as Technical Report ACT-OODS-241-89, Microelectronics and Computer Corporation, Austin, TX.

[AKR90] S. Arvindam, V. Kumar, and V. N. Rao. Efficient parallel algorithms for search problems: Applications in VLSI CAD. In Proceedings of the Third Symposium on the Frontiers of Massively Parallel Computation, 1990.

[AKRS91] S. Arvindam, V. Kumar, V. N. Rao, and V. Singh. Automatic test pattern generation on multiprocessors. Parallel Computing, 17, number 12:1323-1342, December 1991.

[AKS83] M. Ajtai, J. Komlos, and E. Szemeredi. An O (n log n) sorting network. In Proceedings of the 15th Annual ACM Symposium on Theory of Computing, 1-9, 1983.

[AL93] S. G. Akl and K. A. Lyons. Parallel Computational Geometry. Prentice-Hall, Englewood Cliffs, NJ, 1993.

[AM88] T. S. Abdelrahman and T. N. Mudge. Parallel branch-and-bound algorithms on hypercube multiprocessors. In Proceedings of the Third Conference on Hypercubes, Concurrent Computers, and Applications, 1492-1499, New York, NY, 1988. ACM Press.

[Amd67] G. M. Amdahl. Validity of the single processor approach to achieving large scale computing capabilities. In AFIPS Conference Proceedings, 483-485, 1967.

[And91] G. R. Andrews. Concurrent Programming: Principles and Practice. Benjamin/Cummings, Redwood City, CA, 1991.

[AOB93] B. Abali, F. Ozguner, and A. Bataineh. Balanced parallel sort on hypercube multiprocessors. IEEE Transactions on Parallel and Distributed Systems, 4(5):572-581, May 1993.

[AS87] B. Awerbuch and Y. Shiloach. New connectivity and MSF algorithms for shuffle-exchange network and PRAM. IEEE Transactions on Computers, C- 36(10):1258-1263, October 1987.

[AU72] A. V. Aho and J. D. Ullman. The Theory of Parsing, Translation and Compiling: Volume 1, Parsing. Prentice-Hall, Englewood Cliffs, NJ, 1972.

[B⁺97] L. S. Blackford et al. ScaLAPACK Users' Guide. SIAM, 1997.

[BA82] M. Ben-Ari. Principles of Concurrent Programming. Prentice-Hall, Englewood Cliffs, NJ, 1982.

[Bab88] R. G. Babb. Programming Parallel Processors. Addison-Wesley, Reading, MA, 1988.

[Bai90] D. H. Bailey. FFTs in external or hierarchical memory. Journal of Supercomputing, 4:23-35, 1990.

[Bar68] G. H. Barnes. The ILLIAC IV computer. IEEE Transactions on Computers, C-17(8):746-757, 1968.

[Bat68] K. E. Batcher. Sorting networks and their applications. In Proceedings of the 1968 Spring Joint Computer Conference , 307-314, 1968.

[Bat76] K. E. Batcher. The Flip network in STARAN. In Proceedings of International Conference on Parallel Processing, 65-71, 1976.

[Bat80] K. E. Batcher. Design of a massively parallel processor. IEEE Transactions on Computers, 836-840, September 1980.

[Bau78] G. M. Baudet. The Design and Analysis of Algorithms for Asynchronous Multiprocessors. Ph.D. Thesis, Carnegie-Mellon University, Pittsburgh, PA, 1978.

[BB90] K. P. Belkhale and P. Banerjee. Approximate algorithms for the partitionable independent task scheduling problem. In Proceedings of the 1990 International Conference on Parallel Processing, I72-I75, 1990.

[BBN89] BBN Advanced Computers Inc. TC-2000 Technical Product Summary. Cambridge, MA. 1989.

[BCCL95] R. E. Bixby, W. Cook, A. Cox, and E. K. Lee. Parallel mixed integer programming. Technical Report CRPC TR 95554, Center for Research on Parallel Computation, Research Monograph, 1995.

[BCJ90] E. C. Bronson, T. L. Casavant, and L. H. Jamieson. Experimental application-driven architecture analysis of an SIMD/MIMD parallel processing system. IEEE Transactions on Parallel and Distributed Systems, 1(2):195-205, 1990.

[BDHM84] D. Bitton, D. J. DeWitt, D. K. Hsiao, and M. J. Menon. A taxonomy of parallel sorting. Computing Surveys, 16(3):287-318, September 1984.

[Bel57] R. Bellman. Dynamic Programming. Princeton University Press, Princeton, NJ, 1957.

[Bel58] R. Bellman. On a routing problem. Quarterly of Applied Mathematics, 16(1):87-90, 1958.

[Ben80] J. L. Bentley. A parallel algorithm for constructing minimum spanning trees. Journal of the ACM, 27(1):51-59, March 1980.

[Ber84] S. Berkowitz. On computing the determinant in small parallel time using a small number of processors. Information Processing Letters, 18(3):147-150, March 1984.

[Ber89] J. Berntsen. Communication efficient matrix multiplication on hypercubes. Parallel Computing, 12:335-342, 1989.

[BH82] A. Borodin and J. E. Hopcroft. Routing merging and sorting on parallel models of computation. In Proceedings of the 14th Annual ACM Symposium on Theory of Computing, 338-344, May 1982.

[Bix91] R. E. Bixby. Two applications of linear programming. In Proceedings of the Workshop on Parallel Computing of Discrete Optimization Problems, 1991.

[BJK⁺95] R. Blumofe, C. Joerg, B. Kuszmaul, C. Leiserson, K. Randall, and Y. Zhou. Cilk: An efficient multithreaded runtime system. In Proceedings of the 5th Symposium on Principles and Practice of Parallel Programming, 1995.

[BKH89] S. Bershader, T. Kraay, and J. Holland. The giant-Fourier-transform. In Proceedings of the Fourth Conference on Hypercubes, Concurrent Computers, and Applications: Volume I, 387-389, 1989.

[Ble90] G. E. Blelloch. Vector Models for Data-Parallel Computing. MIT Press, Cambridge, MA, 1990.

[BMCP98] A. Brungger, A. Marzetta, J. Clausen, and M. Perregaard. Solving large-scale qap problems in parallel with the search library zram. Journal of Parallel and Distributed Computing, 50:157-169, 1998.

[BNK92] A. Bar-Noy and S. Kipnis. Designing broadcasting algorithms in the postal model for message-passing systems. In Proceedings of 4th ACM Symposium on Parallel Algorithms and Architectures, 13-22, 1992.

[BOS⁺91] D. P. Bertsekas, C. Ozveren, G. D. Stamoulis, P. Tseng, and J. N. Tsitsiklis. Optimal communication algorithms for hypercubes. Journal of Parallel and Distributed Computing, 11:263-275, 1991.

[BR90] R. Boppana and C. S. Raghavendra. On optimal and practical routing methods for a massive data movement operation on hypercubes. Technical report, University of Southern California, Los Angeles, CA, 1990.

[Bra97] R. Bramley. Technology news & reviews: Chemkin software; OpenMP Fortran Standard; ODE toolbox for Matlab; Java products; Scientific WorkPlace 3.0. IEEE Computational Science and Engineering, 4(4):75-78, October/December 1997.

[Bro79] K. Brown. Dynamic programming in computer science. Technical Report CMU-CS-79-106, Carnegie Mellon University, Pittsburgh, PA, 1979.

[BS78] G. M. Baudet and D. Stevenson. Optimal sorting algorithms for parallel computers. IEEE Transactions on Computers, C-27(1):84-87, January 1978.

[BT89] D. P. Bertsekas and J. N. Tsitsiklis. Parallel and Distributed Computation: Numerical Methods. Prentice-Hall, NJ, 1989.

[BT97] D. P. Bertsekas and J. N. Tsitsiklis. Parallel and Distributed Computation: Numerical Methods. Athena Scientific, 1997.

[But97] D. R. Butenhof. Programming with POSIX Threads. Addison-Wesley, Reading, MA, 1997.

[Buy99] R. Buyya, editor. High Performance Cluster Computing: Architectures and Systems. Prentice Hall, 1999.

[BW89] M. L. Barton and G. R. Withers. Computing performance as a function of the speed, quantity, and the cost of processors. In Supercomputing '89 Proceedings, 759-764, 1989.

[BW97] J. Beveridge and R. Wiener. Multithreading Applications in Win32: the Complete Guide to Threads. Addison-Wesley Developers Press, Reading, MA, 1997.

[C⁺95] J. Choi et al. A proposal for a set of Parallel Basic Linear Algebra Subprograms. Technical Report CS-95-292, Computer Science Department, University of Tennessee, 1995.

[CAHH91] N. P. Chrisopchoides, M. Aboelaze, E. N. Houstis, and C. E. Houstis. The parallelization of some level 2 and 3 BLAS operations on distributed-memory machines. In Proceedings of the First International Conference of the Austrian Center of Parallel Computation. Springer-Verlag Series Lecture Notes in Computer Science, 1991.

[Can69] L. E. Cannon. A cellular computer to implement the Kalman Filter Algorithm. Ph.D. Thesis, Montana State University, Bozman, MT, 1969.

[Car89] G. F. Carey, editor. Parallel Supercomputing: Methods, Algorithms and Applications. Wiley, New York, NY, 1989.

[CD87] S. Chandran and L. S. Davis. An approach to parallel vision algorithms. In R. Porth, editor, Parallel Processing. SIAM, Philadelphia, PA, 1987.

[CDK⁺00] R. Chandra, L. Dagum, D. Kohr, D. Maydan, J. McDonald, and R. M. (editors). Parallel Programming in OpenMP. Morgan Kaufmann Publishers, 2000.

[CG87] E. Chu and A. George. Gaussian elimination with partial pivoting and load balancing on a multiprocessor. Parallel Computing, 5:65-74, 1987.

[CGK93] D. Challou, M. Gini, and V. Kumar. Parallel search algorithms for robot motion planning. In Proceedings of the IEEE Conference on Robotics and Automation, 46-51, 1993.

[CGL92] D. E. Culler, M. Gunter, and J. C. Lee. Analysis of multithreaded microprocessors under multiprogramming. Report UCB/CSD 92/687, University of California, Berkeley, Computer Science Division, Berkeley, CA, May 1992.

[Cha79] A. K. Chandra. Maximal parallelism in matrix multiplication. Technical Report RC-6193, IBM T. J. Watson Research Center, Yorktown Heights, NY, 1979.

[Cha87] R. Chamberlain. An alternate view of LU factorization on a hypercube multiprocessor. In M. T. Heath, editor, Hypercube Multiprocessors 1987, 569-575. SIAM, Philadelphia, PA, 1987.

[CJP83] H. Crowder, E. L. Johnson, and M. Padberg. Solving large-scale zero-one linear programming problem. Operations Research, 2:803-834, 1983.

[CKP⁺93a] D. Culler, R. Karp, D. Patterson, A. Sahay, K. Schauser, E. Santos, R. Subramonian, and T. von Eicken. LogP: Towards a realistic model of parallel computation. In Proceedings of the Fourth ACM SIGPLAN Symposium on Principles and Practices of Parallel Programming, 1-12, 1993.

[CKP⁺93b] D. E. Culler, R. Karp, D. A. Patterson, et al. Logp: Towards a realistic model of parallel computation. In Principles and Practices of Parallel Programming, May 1993.

[CL93] B. Codenotti and M. Leoncini. Introduction to Parallel Processing. Addison-Wesley, 1993.

[CLC81] F. Y. Chin, J. Lam, and I. Chen. Optimal parallel algorithms for the connected component problem. In Proceedings of the 1981 International Conference on Parallel Processing, 170-175, 1981.

[CLC82] F. Y. Chin, J. Lam, and I. Chen. Efficient parallel algorithms for some graph problems. Communications of the ACM, 25(9):659-665, September 1982.

[CLR90] T. H. Cormen, C. E. Leiserson, and R. L. Rivest. Introduction to Algorithms. MIT Press, McGraw-Hill, New York, NY, 1990.

[CM82] K. M. Chandy and J. Misra. Distributed computation on graphs: Shortest path algorithms. Communications of the ACM, 25(11):833-837, November 1982.

[CM98] B. Chapman and P. Mehrotra. OpenMP and HPF: Integrating two paradigms. Lecture Notes in Computer Science, 1470, 1998.

[Col88] R. Cole. Parallel merge sort. SIAM Journal on Computing, 17(4):770-785, August 1988.

[Col89] M. Cole. Algorithmic Skeletons: Structured Management of Parallel Computation. MIT Press, Cambridge, MA, 1989.

[Con89] T. Conlon. Programming in PARLOG. Addison-Wesley, Reading, MA, 1989.

[CR89] E. A. Carmona and M. D. Rice. A model of parallel performance. Technical Report AFWL-TR-89-01, Air Force Weapons Laboratory, 1989.

[CR91] E. A. Carmona and M. D. Rice. Modeling the serial and parallel fractions of a parallel algorithm. Journal of Parallel and Distributed Computing, 1991.

[CS88] B. V. Cherkassky and R. Smith. Efficient mapping and implementations of matrix algorithms on a hypercube. Journal of Supercomputing, 2:7-27, 1988.

[CSG98] D. E. Culler, J. P. Singh, and A. Gupta. Parallel Computer Architecture: A Hardware/Software Approach. Morgan Kaufmann, 1998.

[CT92] K. M. Chandy and S. Taylor. An Introduction to Parallel Programming. Jones and Bartlett, Austin, TX, 1992.

[CV91] B. S. Chlebus and I. Vrto. Parallel quicksort. Journal of Parallel and Distributed Processing, 1991.

[Cve87] Z. Cvetanovic. Performance analysis of the FFT algorithm on a shared-memory parallel architecture. IBM Journal of Research and Development, 31(4):435-451, 1987.

[CWP98] A. Cohen, M. Woodring, and R. Petrusha. Win32 Multithreaded Programming. O'Reilly & Associates, 1998.

[D⁺92] W. J. Dally et al. The message-driven processor. IEEE Micro, 12(2):23-39, 1992.

[Dal87] W. J. Dally. A VLSI Architecture for Concurrent Data Structures. Kluwer Academic Publishers, Boston, MA, 1987.

[Dal90a] W. J. Dally. Analysis of k-ary n-cube interconnection networks. IEEE Transactions on Computers, 39(6), June 1990.

[Dal90b] W. J. Dally. Network and processor architecture for message-driven computers. In R. Sauya and G. Birtwistle, editors, VLSI and Parallel Computation. Morgan Kaufmann, San Mateo, CA, 1990.

[Dav86] G. J. Davis. Column LU factorization with pivoting on a hypercube multiprocessor. SIAM Journal on Algebraic and Discrete Methods, 7:538-550, 1986. Also available as Technical Report ORNL-6219, Oak Ridge National Laboratory, Oak Ridge, TN, 1985.

[DCG90] J. D. DeMello, J. L. Calvet, and J. M. Garcia. Vectorization and multitasking of dynamic programming in control: experiments on a CRAY-2. Parallel Computing, 13:261-269, 1990.

[DDSV99] J. Dongarra, I. S. Duff, D. Sorensen, and H. V. Vorst. Numerical Linear Algebra for High Performance Computers (Software, Environments, Tools). SIAM, 1999.

[DeC89] A. L. DeCegama. The Technology of Parallel Processing: Parallel Processing Architectures and VLSI Hardware: Volume 1. Prentice-Hall, Englewood Cliffs, NJ, 1989.

[DEH89] P. M. Dew, R. A. Earnshaw, and T. R. Heywood. Parallel Processing for Computer Vision and Display. Addison-Wesley, Reading, MA, 1989.

[Dem82] J. Deminet. Experiences with multiprocessor algorithms. IEEE Transactions on Computers, C-31(4):278-288, 1982.

[DFHM82] D. J. DeWitt, D. B. Friedland, D. K. Hsiao, and M. J. Menon. A taxonomy of parallel sorting algorithms. Technical Report TR-482, Computer Sciences Department, University of Wisconsin, Madison, WI, 1982.

[DFRC96] F. Dehne, A. Fabri, and A. Rau-Chaplin. Scalable parallel computational geometry for coarse grained multicomputers. International Journal on Computational Geometry, 6(3):379-400, 1996.

[DHvdV93] J. W. Demmel, M. T. Heath, and H. A. van der Vorst. Parallel numerical linear algebra. Acta Numerica, 111-197, 1993.

[Dij59] E. W. Dijkstra. A note on two problems in connection with graphs. Numerische Mathematik, 1:269-271, 1959.

[DM93] S. Dutt and N. R. Mahapatra. Parallel A* algorithms and their performance on hypercube multiprocessors. In Proceedings of the Seventh International Parallel Processing Symposium, 797-803, 1993.

[DM98] L. Dagum and R. Menon. OpenMP: An industry-standard API for shared-memory programming. IEEE Computational Science and Engineering, 5(1):46-55, January/March 1998.

[DNS81] E. Dekel, D. Nassimi, and S. Sahni. Parallel matrix and graph algorithms. SIAM Journal on Computing, 10:657-673, 1981.

[Dra96] D. G. Drake. Introduction to Java threads. JavaWorld: IDG's magazine for the Java community, 1(2), April 1996.

[DRGNP] F. Darema-Rogers, D. George, V. Norton, and G. Pfister. VM parallel environment. In Proceedings of the IBM Kingston Parallel Processing Symposium.

[DS86] W. J. Dally and C. L. Seitz. The torus routing chip. Journal of Distributed Computing, 1(3):187-196, 1986.

[DS87] W. J. Dally and C. L. Seitz. Deadlock-free message routing in multiprocessor interconnection networks. IEEE Transactions on Computers, C-36(5):547- 553, 1987.

[DSG83] E. W. Dijkstra, W. H. Seijen, and A. J. M. V. Gasteren. Derivation of a termination detection algorithm for a distributed computation. Information Processing Letters, 16(5):217-219, 1983.

[DT89] L. Desbat and D. Trystram. Implementing the discrete Fourier transform on a hypercube vector-parallel computer. In Proceedings of the Fourth Conference on Hypercubes, Concurrent Computers, and Applications: Volume I, 407- 410, 1989.

[DV87] K. A. Doshi and P. J. Varman. Optimal graph algorithms on a fixed-size linear array. IEEE Transactions on Computers, C-36(4):460-470, April 1987.

[dV89] E. F. V.de Velde. Multicomputer matrix computations: Theory and practice. In Proceedings of the Fourth Conference on Hypercubes, Concurrent Computers, and Applications, 1303-1308, 1989.

[DY81] N. Deo and Y. B. Yoo. Parallel algorithms for the minimum spanning tree problem. In Proceedings of the 1981 International Conference on Parallel Processing, 188-189, 1981.

[Eck94] J. Eckstein. Parallel branch-and-bound methods for mixed-integer programming on the cm-5. SIAM Journal on Optimization, 4(4):794-814, 1994.

[Eck97] J. Eckstein. Distributed versus centralized storage and control for parallel branch and bound: Mixed integer programming on the cm-5. Computational Optimization and Applications, 7(2):199-220, 1997.

[Ede89] A. Edelman. Optimal matrix transposition and bit-reversal on hypercubes: Node address-memory address exchanges. Technical report, Thinking Machines Corporation, Cambridge, MA, 1989.

[EDH80] O. I. El-Dessouki and W. H. Huen. Distributed enumeration on network computers. IEEE Transactions on Computers, C-29:818-825, September 1980.

[EHHR88] S. C. Eisenstat, M. T. Heath, C. S. Henkel, and C. H. Romine. Modified cyclic algorithms for solving triangular systems on distributed-memory multiprocessors. SIAM Journal on Scientific and Statistical Computing, 9(3):589-600, 1988.

[EHMN90] M. Evett, J. Hendler, A. Mahanti, and D. Nau. PRA*: A memory-limited heuristic search procedure for the connection machine. In Proceedings of the Third Symposium on the Frontiers of Massively Parallel Computation, 145- 149, 1990.

[Ekl72] J. O. Eklundh. A fast computer method for matrix transposing. IEEE Transactions on Computers, 21(7):801-803, 1972.

[Ert92] W. Ertel. OR-parallel theorem proving with random competition. In A. Voronokov, editor, LPAR '92: Logic Programming and Automated Reasoning, 226-237. Springer-Verlag, New York, NY, 1992.

[EZL89] D. L. Eager, J. Zahorjan, and E. D. Lazowska. Speedup versus efficiency in parallel systems. IEEE Transactions on Computers, 38(3):408-423, 1989.

[Fen81] T. Y. Feng. A survey of interconnection networks. IEEE Computer, 12-27, December 1981.

[FF82] R. A. Finkel and J. P. Fishburn. Parallelism in alpha-beta search. Artificial Intelligence, 19:89-106, 1982.

[FF86] G. C. Fox and W. Furmanski. Optimal communication algorithms on hypercube. Technical Report CCCP-314, California Institute of Technology, Pasadena, CA, 1986.

[FJDS96] L. Fosdick, E. Jessup, G. Domik, and C. Schauble. Introduction to High-Performance Scientific Computing. MIT Press, 1996.

[FJL⁺88] G. C. Fox, M. Johnson, G. Lyzenga, S. W. Otto, J. Salmon, and D. Walker. Solving Problems on Concurrent Processors: Volume 1. Prentice-Hall, Englewood Cliffs, NJ, 1988.

[FK88] C. Ferguson and R. Korf. Distributed tree search and its application to alpha-beta pruning. In Proceedings of the 1988 National Conference on Artificial Intelligence, 1988.

[FK89] H. P. Flatt and K. Kennedy. Performance of parallel processors. Parallel Computing, 12:1-20, 1989.

[FKO86] E. Felten, S. Karlin, and S. W.Otto. Sorting on a hypercube. Caltech/JPL, 1986. Hm 244.

[Fla90] H. P. Flatt. Further applications of the overhead model for parallel systems. Technical Report G320-3540, IBM Corporation, Palo Alto Scientific Center, Palo Alto, CA, 1990.

[Flo62] R. W. Floyd. Algorithm 97: Shortest path. Communications of the ACM, 5(6):345, June 1962.

[Fly72] M. J. Flynn. Some computer organizations and their effectiveness. IEEE Transactions on Computers, C-21(9):948-960, 1972.

[Fly95] M. J. Flynn. Computer Architecture: Pipelined and Parallel Processor Design. Jones and Bartlett, 1995.

[FM70] W. D. Frazer and A. C. McKellar. Samplesort: A sampling approach to minimal storage tree sorting. Journal of the ACM, 17(3):496-507, July 1970.

[FM87] R. A. Finkel and U. Manber. DIB-a distributed implementation of backtracking. ACM Transactions on Programming Languages and Systems, 9(2):235-256, April 1987.

[FM92] R. Frye and J. Myczkowski. Load balancing algorithms on the connection machine and their use in Monte-Carlo methods. In Proceedings of the Unstructured Scientific Computation on Multiprocessors Conference, 1992.

[FMM94] R. Feldmann, P. Mysliwietz, and B. Monien. Studying overheads in massively parallel min/max-tree evaluation. In Proc. of the 6th ACM Symposium on Parallel Algorithms and Architectures, 94-103, 1994.

[FOH87] G. C. Fox, S. W. Otto, and A. J. G. Hey. Matrix algorithms on a hypercube I: Matrix multiplication. Parallel Computing, 4:17-31, 1987.

[Fos95] I. Foster. Designing and Building Parallel Programs: Concepts and Tools for Parallel Software Engineering. Addison-Wesley, 1995.

[Fou94] T. J. Fountain. Parallel Computing: Principles and Practice. Cambridge University Press, 1994.

[FR62] L. R. Ford and R. L. Rivest. Flows in Networks. Princeton University Press, Princeton, NJ, 1962.

[Fra93] M. Franklin. The multiscalar architecture. Technical Report CS-TR-1993-1196, University of Wisconsin, 1993.

[FTI90] M. Furuichi, K. Taki, and N. Ichiyoshi. A multi-level load balancing scheme for OR-parallel exhaustive search programs on the Multi-PSI. In Proceedings of the Second ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 50-59, 1990.

[FW78] S. Fortune and J. Wyllie. Parallelism in random access machines. In Proceedings of ACM Symposium on Theory of Computing, 114-118, 1978.

[Gal95] B. O. Gallmeister. Posix. 4 : Programming for the Real World. O'Reilly & Associates, 1995.

[GBD⁺94] G. A. Geist, A. Beguelin, J. Dongarra, W. Jiang, R. Manchek, and V. Sunderam. PVM: Parallel Virtual Machine. MIT Press, Cambridge, MA, 1994.

[Gei85] G. A. Geist. Efficient parallel LU factorization with pivoting on a hypercube multiprocessor. Technical Report ORNL-6211, Oak Ridge National Laboratory, Oak Ridge, TN, 1985.

[GGK⁺83] A. Gottlieb, R. Grishman, C. P. Kruskal, K. P. McAuliffe, L. Rudolph, and M. Snir. The NYU Ultracomputer-designing a MIMD, shared memory parallel computer. IEEE Transactions on Computers, C-32(2):175-189, February 1983.

[GGK93] A. Y. Grama, A. Gupta, and V. Kumar. Isoefficiency: Measuring the scalability of parallel algorithms and architectures. IEEE Parallel and Distributed Technology, 1(3):12-21, August 1993.

[GH85] G. A. Geist and M. T. Heath. Parallel Cholesky factorization on a hypercube multiprocessor. Technical Report ORNL-6190, Oak Ridge National Laboratory, Oak Ridge, TN, 1985.

[GH86] G. A. Geist and M. T. Heath. Matrix factorization on a hypercube multiprocessor. In M. T. Heath, editor, Hypercube Multiprocessors 1986, 161-180. SIAM, Philadelphia, PA, 1986.

[GH01] S. Goedecker and A. Hoisie. Performance Optimization of Numerically Intensive Codes. SIAM, 2001.

[Gib85] A. Gibbons. Algorithmic Graph Theory. Cambridge University Press, Cambridge, 1985.

[Gib89] P. B. Gibbons. A more practical PRAM model. In Proceedings of the 1989 ACM Symposium on Parallel Algorithms and Architectures, 158-168, 1989.

[GK91] A. Gupta and V. Kumar. The scalability of matrix multiplication algorithms on parallel computers. Technical Report TR 91-54, Department of Computer Science, University of Minnesota, Minneapolis, MN, 1991. A short version appears in Proceedings of 1993 International Conference on Parallel Processing, pages III-115-III-119, 1993.

[GK93a] A. Gupta and V. Kumar. Performance properties of large scale parallel systems. Journal of Parallel and Distributed Computing, 19:234-244, 1993. Also available as Technical Report TR 92-32, Department of Computer Science, University of Minnesota, Minneapolis, MN.

[GK93b] A. Gupta and V. Kumar. The scalability of FFT on parallel computers. IEEE Transactions on Parallel and Distributed Systems, 4(8):922-932, August 1993. A detailed version is available as Technical Report TR 90-53, Department of Computer Science, University of Minnesota, Minneapolis, MN.

[GKP92] A. Grama, V. Kumar, and P. M. Pardalos. Parallel processing of discrete optimization problems. In Encyclopaedia of Microcomputers. Marcel Dekker Inc., New York, 1992.

[GKR91] A. Y. Grama, V. Kumar, and V. N. Rao. Experimental evaluation of load balancing techniques for the hypercube. In Proceedings of the Parallel Computing '91 Conference, 497-514, 1991.

[GKRS96] A. Grama, V. Kumar, S. Ranka, and V. Singh. A³: A simple and asymptotically accurate model for parallel computation. In Proceedings of the Sixth Symposium on Frontiers of Massively Parallel Computing, Annapolis, MD, 1996.

[GKS92] A. Gupta, V. Kumar, and A. H. Sameh. Performance and scalability of preconditioned conjugate gradient methods on parallel computers. Technical Report TR 92-64, Department of Computer Science, University of Minnesota, Minneapolis, MN, 1992. A short version appears in Proceedings of the Sixth SIAM Conference on Parallel Processing for Scientific Computing, pages 664-674, 1993.

[GKT79] L. J. Guibas, H. T. Kung, and C. D. Thompson. Direct VLSI Implementation of Combinatorial Algorithms. In Proceedings of Conference on Very Large Scale Integration, California Institute of Technology, 509-525, 1979.

[GL96a] G. H. Golub and C. V. Loan. Matrix Computations. The Johns Hopkins University Press, Baltimore, MD, 1996.

[GL96b] W. D. Gropp and E. Lusk. User's Guide for mpich, a Portable Implementation of MPI. Mathematics and Computer Science Division, Argonne National Laboratory. ANL-96/6. 1996.

[GLDS96] W. Gropp, E. Lusk, N. Doss, and A. Skjellum. A high-performance, portable implementation of the MPI message passing interface standard. Parallel Computing, 22(6):789-828, September 1996.

[GLS99] W. Gropp, E. Lusk, and A. Skjellum. Using MPI. MIT Press, 1999. 2nd Edition.

[GMB88] J. L. Gustafson, G. R. Montry, and R. E. Benner. Development of parallel methods for a 1024-processor hypercube. SIAM Journal on Scientific and Statistical Computing, 9(4):609-638, 1988.

[GO93] G. H. Golub and J. M. Ortega. Scientific Computing: An Introduction with Parallel Computing. Academic Press, 1993.

[GPS90] K. A. Gallivan, R. J. Plemmons, and A. H. Sameh. Parallel algorithms for dense linear algebra computations . SIAM Review, 32(1):54-135, March 1990. Also appears in K. A. Gallivan et al. Parallel Algorithms for Matrix Computations. SIAM, Philadelphia, PA, 1990.

[GR88] G. A. Geist and C. H. Romine. LU factorization algorithms on distributed-memory multiprocessor architectures. SIAM Journal on Scientific and Statistical Computing, 9(4):639-649, 1988. Also available as Technical Report ORNL/TM-10383, Oak Ridge National Laboratory, Oak Ridge, TN, 1987.

[GR90] A. Gibbons and W. Rytter. Efficient Parallel Algorithms. Cambridge University Press, Cambridge, UK, 1990.

[Gre91] S. Green. Parallel Processing for Computer Graphics. MIT Press, Cambridge, MA, 1991.

[GSNL98] W. Gropp, M. Snir, W. Nitzberg, and E. Lusk. MPI: The Complete Reference. MIT Press, 1998.

[GT88] A. V. Goldberg and R. E. Tarjan. A new approach to the maximum-flow problem. Journal of the ACM, 35(4):921-940, October 1988.

[Gup87] A. Gupta. Parallelism in Production Systems. Morgan Kaufmann, Los Altos, CA, 1987.

[Gus88] J. L. Gustafson. Reevaluating Amdahl's law. Communications of the ACM, 31(5):532-533, 1988.

[Gus92] J. L. Gustafson. The consequences of fixed time performance measurement. In Proceedings of the 25th Hawaii International Conference on System Sciences: Volume III, 113-124, 1992.

[HB84] K. Hwang and F. A. Briggs. Computer Architecture and Parallel Processing. McGraw-Hill, New York, NY, 1984.

[HB88] M. M. Huntbach and F. W. Burton. Alpha-beta search on virtual tree machines. Information Science, 44:3-17, 1988.

[HCH95] F.-H. Hsu, M. S. Campbell, and A. J. Hoane. Deep Blue system overview. In Proceedings of the 1995 International Conference on Supercomputing, Barcelona, Spain, 240-244, 1995.

[HCS79] D. S. Hirschberg, A. K. Chandra, and D. V. Sarwate. Computing connected components on parallel computers. Communications of the ACM, 22(8):461- 464, August 1979.

[HD87] S.-R. Huang and L. S. Davis. A tight upper bound for the speedup of parallel best-first branch-and-bound algorithms. Technical report, Center for Automation Research, University of Maryland, College Park, MD, 1987.

[HD89a] S. R. Huang and L. S. Davis. Parallel iterative a* search: An admissible distributed heuristic search algorithm. In Proceedings of the Eleventh International Joint Conference on Artificial Intelligence, 23-29, 1989.

[HD89b] K. Hwang and D. DeGroot. Parallel Processing for Supercomputers and Artificial Intelligence. McGraw-Hill, New York, NY, 1989.

[HDM97] J. Hill, S. Donaldson, and A. McEwan. Installation and user guide for the oxford bsp toolset: User guide for the oxford bsp toolset (v1.3) implementation of bsplib. Technical report, Oxford University Computing Laboratory, 1997.

[Hea85] M. T. Heath. Parallel Cholesky factorization in message-passing multiprocessor environments. Technical Report ORNL-6150, Oak Ridge National Laboratory, Oak Ridge, TN, 1985.

[HG97] S. Hamilton and L. Garber. Deep Blue's hardware-software synergy. IEEE Computer, 30(10):29-35, October 1997.

[Hil85] W. D. Hillis. The Connection Machine. MIT Press, Cambridge, MA, 1985.

[Hil90] M. D. Hill. What is scalability? Computer Architecture News, 18(4), 1990.

[Hip89] P. G. Hipes. Matrix multiplication on the JPL/Caltech Mark IIIfp hypercube. Technical Report C3P 746, Concurrent Computation Program, California Institute of Technology, Pasadena, CA, 1989.

[Hir76] D. S. Hirschberg. Parallel algorithms for the transitive closure and connected component problem. In Proceedings of the 8th Annual ACM Symposium on the Theory of Computing, 55-57, 1976.

[Hir78] D. S. Hirschberg. Fast parallel sorting algorithms. Communications of the ACM, 21(8):657-666, August 1978.

[HJ87] C.-T. Ho and S. L. Johnsson. Spanning balanced trees in Boolean cubes. Technical Report YALEU/DCS/RR-508, Department of Computer Science, Yale University, New Haven, CT, 1987.

[HJE91] C.-T. Ho, S. L. Johnsson, and A. Edelman. Matrix multiplication on hypercubes using full bandwidth and constant storage. In Proceedings of the 1991 International Conference on Parallel Processing, 447-451, 1991.

[HK96] S. Hambrusch and A. Khokhar. C³: A parallel model for coarse-grained machines. Journal of Parallel and Distributed Computing, 32(2):139-154, February 1996.

[HLM84] R. E. Hiromoto, O. M. Lubeck, and J. Moore. Experiences with the Denelcor HEP. Parallel Computing, 1(3-4):197-206, 1984.

[HLV90] S. H. S. Huang, H. Liu, and V. Vishwanathan. A sub-linear parallel algorithm for some dynamic programming problems. In Proceedings of the 1990 International Conference on Parallel Processing, III-261-III-264, 1990.

[HM80] D. K. Hsiao and M. J. Menon. Parallel record-sorting methods for hardware realization. Osu-cisrc-tr-80-7, Computer Science Information Department, Ohio State University, Columbus, OH, 1980.

[HMT⁺96] H. Hum, O. Maquelin, K. Theobald, X. Tian, and G. Gao. A study of the earth-manna multithreaded system. Intl. J. of Par. Prog., 24:319-347, 1996.

[HNR90] P. Heidelberger, A. Norton, and J. T. Robinson. Parallel quicksort using fetch-and-add. IEEE Transactions on Computers, C-39(1):133-138, January 1990.

[Hoa62] C. A. R. Hoare. Quicksort. Computer Journal, 5:10-15, 1962.

[HP89] S. W. Hornick and F. P. Preparata. Deterministic PRAM simulation with constant redundancy. In Proceedings of the 1989 ACM Symposium on Parallel Algorithms and Architectures, 103-109, 1989.

[HQ91] P. J. Hatcher and M. J. Quinn. Data Parallel Programming. MIT Press, Cambridge, MA, 1991.

[HR88] M. T. Heath and C. H. Romine. Parallel solution of triangular systems on distributed-memory multiprocessors. SIAM Journal on Scientific and Statistical Computing, 9(3):558-588, 1988.

[HR91] C.-T. Ho and M. T. Raghunath. Efficient communication primitives on circuit-switched hypercubes. In Sixth Distributed Memory Computing Conference Proceedings, 390-397, 1991.

[HS78] E. Horowitz and S. Sahni. Fundamentals of Computer Algorithms. Computer Science Press, Rockville, MD, 1978.

[HS86] W. D. Hillis and G. L. Steele. Data parallel algorithms. Communications of the ACM, 29(12):1170-1183, 1986.

[Hsu90] F.-H. Hsu. Large scale parallelization of alpha-beta search: An algorithmic and architectural study with computer chess. Technical report, Carnegie Mellon University, Pittsburgh, PA, 1990. Ph.D. Thesis.

[Hua85] M. A. Huang. Solving some graph problems with optimal or near-optimal speedup on mesh-of-trees networks. In Proceedings of the 26th Annual IEEE Symposium on Foundations of Computer Science, 232-340, 1985.

[HX98] K. Hwang and Z. Xu. Scalable Parallel Computing. McGraw-Hill, New York, NY, 1998.

[Hyd99] P. Hyde. Java Thread Programming. Sams, 1999.

[IPS91] O. H. Ibarra, T. C. Pong, and S. M. Sohn. Parallel recognition and parsing on the hypercube. IEEE Transactions on Computers, 40(6):764-770, June 1991.

[IYF79] M. Imai, Y. Yoshida, and T. Fukumura. A parallel searching scheme for multiprocessor systems and its application to combinatorial problems. In Proceedings of the International Joint Conference on Artificial Intelligence, 416-418, 1979.

[Jaj92] J. Jaja. An Introduction to Parallel Algorithms. Addison-Wesley, Reading, MA, 1992.

[JAM87] V. K. Janakiram, D. P. Agrawal, and R. Mehrotra. Randomized parallel algorithms for Prolog programs and backtracking applications. In Proceedings of the 1987 International Conference on Parallel Processing, 278-281, 1987.

[JAM88] V. K. Janakiram, D. P. Agrawal, and R. Mehrotra. A randomized parallel backtracking algorithm. IEEE Transactions on Computers, C-37(12), 1988.

[JGD87] L. H. Jamieson, D. B. Gannon, and R. J. Douglass, editors. The Characteristics of Parallel Algorithms. MIT Press, Cambridge, MA, 1987.

[JH88] S. L. Johnsson and C.-T. Ho. Matrix transposition on Boolean n-cube configured ensemble architectures. SIAM Journal on Matrix Analysis and Applications, 9(3):419-454, July 1988.

[JH89] S. L. Johnsson and C.-T. Ho. Optimum broadcasting and personalized communication in hypercubes. IEEE Transactions on Computers, 38(9):1249- 1268, September 1989.

[JH91] S. L. Johnsson and C.-T. Ho. Optimal all-to-all personalized communication with minimum span on Boolean cubes. In Sixth Distributed Memory Computing Conference Proceedings, 299-304, 1991.

[JKFM89] S. L. Johnsson, R. Krawitz, R. Frye, and D. McDonald. A radix-2 FFT on the connection machine. Technical report, Thinking Machines Corporation, Cambridge, MA, 1989.

[JNS97] L. Johnson, G. Nemhauser, and M. Savelsbergh. Progress in integer programming: An exposition. Technical report, School of Industrial and Systems Engineering, Georgia Institute of Technology, 1997. Available from http://akula.isye.gatech.edu/mwps/mwps.html.

[Joh77] D. B. Johnson. Efficient algorithms for shortest paths in sparse networks. Journal of the ACM, 24(1):1-13, March 1977.

[Joh84] S. L. Johnsson. Combining parallel and sequential sorting on a boolean n-cube. In Proceedings of International Conference on Parallel Processing, 1984.

[Joh87] S. L. Johnsson. Communication efficient basic linear algebra computations on hypercube architectures. Journal of Parallel and Distributed Computing, 4(2):133-172, April 1987.

[Joh90] S. L. Johnsson. Communication in network architectures. In R. Suaya and G. Birtwistle, editors, VLSI and Parallel Computation, 223-389. Morgan Kaufmann, San Mateo, CA, 1990.

[JP93] M. T. Jones and P. E. Plassmann. A parallel graph coloring heuristic. SIAM Journal on Scientific Computing, 14:654-669, 1993.

[JS87] J.-F. Jenq and S. Sahni. All pairs shortest paths on a hypercube multiprocessor. In Proceedings of the 1987 International Conference on Parallel Processing, 713-716, 1987.

[KA88] R. A. Kamin and G. B. Adams. Fast Fourier transform algorithm design and tradeoffs. Technical Report RIACS TR 88.18, NASA Ames Research Center, Moffet Field, CA, 1988.

[KA99a] Y.-K. Kwok and I. Ahmad. Benchmarking and comparison of the task graph scheduling algorithms. Journal of Parallel and Distributed Computing, 59:381-422, 1999.

[KA99b] Y.-K. Kwok and I. Ahmad. Static scheduling algorithms for allocating directed task graphs to multiprocessors. ACM Computing Surveys, 31(4):406- 471, 1999.

[KB57] T. C. Koopmans and M. J. Beckmann. Assignment problems and the location of economic activities. Econometrica, 25:53-76, 1957.

[Ken90] Kendall Square Research Corporation. KSR-1 Overview. Waltham, MA. 1990.

[KF90] A. H. Karp and H. P. Flatt. Measuring parallel processor performance. Communications of the ACM, 33(5):539-543, 1990.

[KG94] V. Kumar and A. Gupta. Analyzing scalability of parallel algorithms and architectures. Journal of Parallel and Distributed Computing, 22(3):379-391, 1994. Also available as Technical Report TR 91-18, Department of Computer Science Department, University of Minnesota, Minneapolis, MN.

[KGK90] V. Kumar, P. S. Gopalakrishnan, and L. N. Kanal, editors. Parallel Algorithms for Machine Intelligence and Vision. Springer-Verlag, New York, NY, 1990.

[KGR94] V. Kumar, A. Grama, and V. N. Rao. Scalable load balancing techniques for parallel computers. Journal of Parallel and Distributed Computing, 22(1):60- 79, July 1994.

[KH67] R. M. Karp and M. H. Held. Finite state processes and dynamic programming. SIAM Journal of Applied Math, 15:693-718, 1967.

[KH83] M. Kumar and D. S. Hirschberg. An efficient implementation of Batcher's odd-even merge algorithm and its application in parallel sorting schemes. IEEE Transactions on Computers, C-32, March 1983.

[KK79] P. Kermani and L. Kleinrock. Virtual cut-through: A new communication switching technique. Computer Networks, 3(4):267-286, 1979.

[KK83] V. Kumar and L. N. Kanal. A general branch-and-bound formulation for understanding and synthesizing and/or tree search procedures. Artificial Intelligence, 21:179-198, 1983.

[KK84] V. Kumar and L. N. Kanal. Parallel branch-and-bound formulations for and/or tree search. IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-6:768-778, 1984.

[KK88a] L. N. Kanal and V. Kumar. Search in Artificial Intelligence. Springer-Verlag, New York, NY, 1988.

[KK88b] V. Kumar and L. N. Kanal. The CDP: A unifying formulation for heuristic search, dynamic programming, and branch-and-bound. In L. N. Kanal and V. Kumar, editors, Search in Artificial Intelligence, 1-27. Springer-Verlag, New York, NY, 1988.

[KK93] G. Karypis and V. Kumar. Efficient Parallel Mappings of a Dynamic Programming Algorithm. In Proceedings of 7th International Parallel Processing Symposium, number 563-568, 1993.

[KK94] G. Karypis and V. Kumar. Unstructured tree search on simd parallel computers. Journal of Parallel and Distributed Computing, 22(3):379-391, September 1994.

[KK99] G. Karypis and V. Kumar. Parallel multilevel k-way partitioning for irregular graphs. SIAM Review, 41(2):278-300, 1999.

[KKKS94] L. N. Kanal, V. Kumar, H. Kitano, and C. Suttner, editors. Parallel Processing for Artificial Intelligence. North-Holland, Amsterdam, The Netherlands, 1994.

[KN91] K. Kimura and I. Nobuyuki. Probabilistic analysis of the efficiency of the dynamic load distribution. In Sixth Distributed Memory Computing Conference Proceedings, 1991.

[Knu73] D. E. Knuth. The Art of Computer Programming: Sorting and Searching. Addison-Wesley, Reading, MA, 1973.

[Kor81] W. Kornfeld. The use of parallelism to implement a heuristic search. In Proceedings of the International Joint Conference on Artificial Intelligence, 575-580, 1981.

[Kow88] J. S. Kowalik. Parallel Computation and Computers for Artificial Intelligence. Kluwer Academic Publishers, Boston, MA, 1988.

[KP92] C. Kaklamanis and G. Persiano. Branch-and-bound and backtrack search on mesh-connected arrays of processors. In Proceedings of Fourth Annual Symposium on Parallel Algorithms and Architectures, 118-126, 1992.

[KR87a] V. K. P. Kumar and C. S. Raghavendra. Array processor with multiple broadcasting. Journal of Parallel and Distributed Computing, 173-190, 1987.

[KR87b] V. Kumar and V. N. Rao. Parallel depth-first search, part II: Analysis. International Journal of Parallel Programming, 16(6):501-519, 1987.

[KR88] R. M. Karp and V. Ramachandran. A survey of complexity of algorithms for shared-memory machines. Technical Report 408, University of California, Berkeley, 1988.

[KR89] V. Kumar and V. N. Rao. Load balancing on the hypercube architecture. In Proceedings of the Fourth Conference on Hypercubes, Concurrent Computers, and Applications, 603-608, 1989.

[KRR88] V. Kumar, K. Ramesh, and V. N. Rao. Parallel best-first search of state-space graphs: A summary of results. In Proceedings of the 1988 National Conference on Artificial Intelligence, 122-126, 1988.

[KRS88] C. P. Kruskal, L. Rudolph, and M. Snir. A complexity theory of efficient parallel algorithms. Technical Report RC13572, IBM T. J. Watson Research Center, Yorktown Heights, NY, 1988.

[Kru56] J. B. Kruskal. On the shortest spanning subtree of a graph and the traveling salesman problem. In Proceedings of the AMS, volume 7, 48-50, 1956.

[KS88] J. Kuehn and B. Smith. The horizon supercomputing system: architecture and software. In Proceedings of Supercomputing Conference, 28-34, 1988.

[KS91a] L. V. Kale and V. Saletore. Efficient parallel execution of IDA* on shared and distributed-memory multiprocessors. In Sixth Distributed Memory Computing Conference Proceedings, 1991.

[KS91b] V. Kumar and V. Singh. Scalability of parallel algorithms for the all-pairs shortest path problem. Journal of Parallel and Distributed Computing, 13(2):124-138, October 1991. A short version appears in the Proceedings of the International Conference on Parallel Processing, 1990.

[KSS95] S. Kleiman, D. Shah, and B. Smaalders. Programming with Threads. SunSoft Press, Mountainview, CA, 1995.

[KU86] A. R. Karlin and E. Upfal. Parallel hashing - an efficient implementation of shared memory. In Proceedings of 18th ACM Conference on Theory of Computing, 160-168, 1986.

[Kun80] J. T. Kung. The structure of parallel algorithms. In M. Yovits, editor, Advances in Computing, 73-74. Academic Press, San Diego, CA, 1980.

[Kun86] H. T. Kung. Memory requirements for balanced computer architectures. In Proceedings of the 1986 IEEE Symposium on Computer Architecture, 49-54, 1986.

[KV92] K. Kreeger and N. R. Vempaty. Comparison of meshes vs. hypercubes for data rearrangement. Technical Report UCF-CS-92-28, Department of Computer Science, University of Central Florida, Orlando, FL, 1992.

[KZ88] R. M. Karp and Y. Zhang. A randomized parallel branch-and-bound procedure. In Proceedings of the ACM Annual Symposium on Theory of Computing, 290-300, 1988.

[Law75] D. H. Lawrie. Access and alignment of data in an array processor. IEEE Transactions on Computers, C-24(1):1145-1155, 1975.

[LB95a] B. Lewis and D. J. Berg. Threads Primer: A Guide to Multithreaded Programming. Prentice Hall PTR/Sun Microsystems Press, 1995.

[LB95b] B.-H. Lim and R. Bianchini. Limits on the performance benefits of multithreading and prefetching. Research report RC 20238 (89547), IBM T. J. Watson Research Center, Yorktown Heights, NY, October 1995.

[LB97] B. Lewis and D. J. Berg. Multithreaded Programming with Pthreads. Prentice Hall PTR/Sun Microsystems Press, 1997.

[LB98] T. G. Lewis and D. Berg. Multithreaded Programming with PThreads. Sun Microsystems Press / Prentice Hall, 1998.

[LC88] G.-J. Li and T. Coleman. A parallel triangular solver for a hypercube multiprocessor. SIAM Journal on Scientific and Statistical Computing, 9:485-502, 1988.

[LC89] G.-J. Li and T. Coleman. A new method for solving triangular systems on distributed memory message passing multiprocessors. SIAM Journal on Scientific and Statistical Computing, 10:382-396, 1989.

[LD90] S. Lakshmivarahan and S. K. Dhall. Analysis and Design of Parallel Algorithms: Arithmetic and Matrix Problems. McGraw-Hill, New York, NY, 1990.

[LDP89] M. R. Leuze, L. W. Dowdy, and K. H. Park. Multiprogramming a distributed-memory multiprocessor. Concurrency: Practice and Experience, 1(1):19-33, September 1989.

[Lea99] D. Lea. Concurrent Programming in Java, Second Edition: Design Principles and Patterns. Addison-Wesley, 1999.

[Lei83] F. T. Leighton. Parallel computation using mesh of trees. In Proceedings of International Workshop on Graph-Theoretic Concepts in Computer Science, 1983.

[Lei85a] F. T. Leighton. Tight bounds on the complexity of parallel sorting. IEEE Transactions on Computers, C-34(4):344-354, April 1985.

[Lei85b] C. E. Leiserson. Fat-trees: Universal networks for hardware efficient supercomputing. In Proceedings of the 1985 International Conference on Parallel Processing, 393-402, 1985.

[Lei92] F. T. Leighton. Introduction to Parallel Algorithms and Architectures. Morgan Kaufmann, San Mateo, CA, 1992.

[LER92] T. G. Lewis and H. El-Rewini. Introduction to Parallel Computing. Prentice-Hall, Englewood Cliffs, NJ, 1992.

[Les93] B. P. Lester. The Art of Parallel Programming. Prentice-Hall, Englewood Cliffs, NJ, 1993.

[Lev87] S. P. Levitan. Measuring communications structures in parallel architectures and algorithms. In L. H. Jamieson, D. B. Gannon, and R. J. Douglass, editors, The Characteristics of Parallel Algorithms. MIT Press, Cambridge, MA, 1987.

[Lew91] D. A. Lewine. Posix Programmer's Guide: Writing Portable Unix Programs with the Posix. 1 Standard. O'Reilly & Associates, 1991.

[LHZ98] H. Lu, C. Hu, and W. Zwaenepoel. OpenMP on networks of workstations. In SC '98, High Performance Networking and Computing Conference, Orlando, Florida, 1998.

[Lil92] D. J. Lilja. Architectural Alternatives for Exploiting Parallelism. IEEE Computer Society Press, Los Alamitos, CA, 1992.

[Lin83] G. Lindstrom. The key node method: A highly parallel alpha-beta algorithm. Technical Report 83-101, Computer Science Department, University of Utah, Salt Lake City, UT, 1983.

[Lin92] Z. Lin. A distributed fair polling scheme applied to or-parallel logic programming. International Journal of Parallel Programming, 20(4), August 1992.

[LK72] K. N. Levitt and W. T. Kautz. Cellular arrays for the solution of graph problems. Communications of the ACM, 15(9):789-801, September 1972.

[LK85] D. B. Leifker and L. N. Kanal. A hybrid SSS*/alpha-beta algorithm for parallel search of game trees. In Proceedings of the International Joint Conference on Artificial Intelligence, 1044-1046, 1985.

[LLG⁺92] D. Lenoski, J. Laudon, K. Gharachorloo, W. D. Weber, A. Gupta, J. L. Hennessy, M. Horowitz, and M. Lam. The Stanford dash multiprocessor. IEEE Computer, 63-79, March 1992.

[LM97] E. K. Lee and J. E. Mitchell. Computational experience of an interior-point algorithm in a parallel branch-and-cut framework. In Proceedings for SIAM Conference on Parallel Processing for Scientific Computing, 1997.

[LMR88] F. T. Leighton, B. Maggs, and S. K. Rao. Universal packet routing algorithms. In 29th Annual Symposium on Foundations of Computer Science, 256-271, 1988.

[Loa92] C. V. Loan. Computational Frameworks for the Fast Fourier Transform. SIAM, Philadelphia, PA, 1992.

[LP92] Y. Li and P. M. Pardalos. Parallel algorithms for the quadratic assignment problem. In P. M. Pardalos, editor, Advances in Optimization and Parallel Computing, 177-189. North-Holland, Amsterdam, The Netherlands, 1992.

[LPP88] F. Luccio, A. Pietracaprina, and G. Pucci. A probabilistic simulation of PRAMs on a bounded degree network. Information Processing Letters, 28:141-147, July 1988.

[LPP89] F. Luccio, A. Pietracaprina, and G. Pucci. A new scheme for deterministic simulation of PRAMs in VLSI. SIAM Journal of Computing, 1989.

[LRZ95] C. Leiserson, K. Randall, and Y. Zhou. Cilk: An efficient multithreaded runtime system. In Proceedings of the Fifth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), Santa Barbara, CA, 1995.

[LS84] T. H. Lai and S. Sahni. Anomalies in parallel branch and bound algorithms. Communications of the ACM, 594-602, 1984.

[LS85] T. H. Lai and A. Sprague. Performance of parallel branch-and-bound algorithms. IEEE Transactions on Computers, C-34(10), October 1985.

[LS86] T. H. Lai and A. Sprague. A note on anomalies in parallel branch-and-bound algorithms with one-to-one bounding functions. Information Processing Letters, 23:119-122, October 1986.

[LSS88] J. Lee, E. Shragowitz, and S. Sahni. A hypercube algorithm for the 0/1 knapsack problem. Journal of Parallel and Distributed Computing, (5):438-456, 1988.

[Lub86] M. Luby. A simple parallel algorithm for the maximal independent set problem. SIAM Journal on Computing, 15(4):1036-1053, 1986.

[LW66] E. L. Lawler and D. Woods. Branch-and-bound methods: A survey. Operations Research, 14, 1966.

[LW84] G.-J. Li and B. W. Wah. Computational efficiency of parallel approximate branch-and-bound algorithms. In Proceedings of the 1984 International Conference on Parallel Processing, 473-480, 1984.

[LW85] G.-J. Li and B. W. Wah. Parallel processing of serial dynamic programming problems. In Proceedings of COMPSAC 85, 81-89, 1985.

[LW86] G.-J. Li and B. W. Wah. Coping with anomalies in parallel branch-and-bound algorithms. IEEE Transactions on Computers, C-35, June 1986.

[LW95] D. Lenoski and W. D. Weber. Scalable Shared-Memory Multiprocessing. Morgan Kaufmann, San Mateo, CA, 1995.

[LY86] M. Li and Y. Yesha. New lower bounds for parallel computations. In Proceedings of 18th ACM Conference on Theory of Computing, 177-187, 1986.

[MC82] T. A. Marsland and M. S. Campbell. Parallel search of strongly ordered game trees. Computing Surveys, 14:533-551, 1982.

[MD92] A. Mahanti and C. Daniels. SIMD parallel heuristic search. Artificial Intelligence, 1992.

[MD93] N. R. Mahapatra and S. Dutt. Scalable duplicate pruning strategies for parallel A* graph search. In Proceedings of the Fifth IEEE Symposium on Parallel and Distributed Processing, 1993.

[MdV87] O. A. McBryan and E. F. V. de Velde. Hypercube algorithms and implementations. SIAM Journal on Scientific and Statistical Computing, 8(2):s227-s287, March 1987.

[Mes94] Message Passing Interface Forum. MPI: A Message-Passing Interface Standard. Available at http://www.mpi-forum.org. May 1994.

[Mes97] Message Passing Interface Forum. MPI-2: Extensions to the Message-Passing Interface. Available at http://www.mpi-forum.org. July 1997.

[MFMV90] B. Monien, R. Feldmann, P. Mysliwietz, and O. Vornberger. Parallel game tree search by dynamic tree decomposition. In V. Kumar, P. S. Gopalakrishnan, and L. N. Kanal, editors, Parallel Algorithms for Machine Intelligence and Vision. Springer-Verlag, New York, NY, 1990.

[MG89] C. U. Martel and D. Q. Gusfield. A fast parallel quicksort algorithm. Information Processing Letters, 30:97-102, 1989.

[Mil91] D. Miller. Exact distributed algorithms for travelling salesman problem. In Proceedings of the Workshop on Parallel Computing of Discrete Optimization Problems, 1991.

[MK99] J. Magee and J. Kramer. Concurrency: State Models and Java Programs. John Wiley & Sons, 1999.

[MKRS88] R. Miller, V. K. P. Kumar, D. I. Reisis, and Q. F. Stout. Meshes with reconfigurable buses. In Proceedings of MIT Conference on Advanced Research in VLSI, 163-178, 1988.

[MM73] A. Martelli and U. Montanari. From dynamic programming to search algorithms with functional costs. In Proceedings of the International Joint Conference on Artifi cial Intelligence, 345-349, 1973.

[MM91] P. Messina and A. Murli, editors. Practical Parallel Computing: Status and Prospects. Wiley, Chichester, UK, 1991.

[MMR95] B. Mans, T. Mautor, and C. Roucairol. A parallel depth first search branch and bound for the quadratic assignment problem. European Journal of Operational Research, 81(3):617-628, 1995.

[Mod88] J. J. Modi. Parallel Algorithms and Matrix Computation. Oxford University Press, Oxford, UK, 1988.

[Moh83] J. Mohan. Experience with two parallel programs solving the traveling salesman problem. In Proceedings of the 1983 International Conference on Parallel Processing, 191-193, 1983.

[Mol86] C. B. Moler. Matrix computation on distributed-memory multiprocessors. In M. T. Heath, editor, Hypercube Multiprocessors 1986, 181-195. SIAM, Philadelphia, PA, 1986.

[Mol87] C. B. Moler. Another look at Amdahl's law. Technical Report TN-02-0587-0288, Intel Scientific Computers, 1987.

[Mol93] D. I. Moldovan. Parallel Processing: From Applications to Systems. Morgan Kaufmann, San Mateo, CA, 1993.

[MP85] T. A. Marsland and F. Popowich. Parallel game tree search. IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-7(4):442-452, July 1985.

[MP93] D. L. Miller and J. F. Pekny. The role of performance metrics for parallel mathematical programming algorithms. ORSA Journal on Computing, 5(1), 1993.

[MR] D. C. Marinescu and J. R. Rice. On high level characterization of parallelism. Technical Report CSD-TR-1011, CAPO Report CER-90-32, Computer Science Department, Purdue University, West Lafayette, IN. Also published in Journal of Parallel and Distributed Computing, 1993.

[MRSR92] G. P. McKeown, V. J. Rayward-Smith, and S. A. Rush. Parallel Branch-and-Bound, 111-150. Advanced Topics in Computer Science. Blackwell Scientific Publications, Oxford, UK, 1992.

[MS88] Y. W. E. Ma and D. G. Shea. Downward scalability of parallel architectures. In Proceedings of the 1988 International Conference on Supercomputing, 109-120, 1988.

[MS90] G. Manzini and M. Somalvico. Probabilistic performance analysis of heuristic search using parallel hash tables. In Proceedings of the International Symposium on Artificial Intelligence and Mathematics, 1990.

[MS96] R. Miller and Q. F. Stout. Parallel Algorithms for Regular Architectures. MIT Press, Cambridge, MA, 1996.

[MV84] K. Mehlhorn and U. Vishkin. Randomized and deterministic simulations of PRAMs by parallel machines with restricted granularity of parallel memories. Acta Informatica, 21(4):339-374, November 1984.

[MV85] B. Monien and O. Vornberger. The ring machine. Technical report, University of Paderborn, Germany, 1985. Also in Computers and Artificial Intelligence, 3(1987).

[MV87] B. Monien and O. Vornberger. Parallel processing of combinatorial search trees. In Proceedings of International Workshop on Parallel Algorithms and Architectures, 1987.

[MVS86] B. Monien, O. Vornberger, and E. Spekenmeyer. Superlinear speedup for parallel backtracking. Technical Report 30, University of Paderborn, Germany, 1986.

[NA91] D. Nussbaum and A. Agarwal. Scalability of parallel machines. Communications of the ACM, 34(3):57-61, 1991.

[Nat90] L. Natvig. Investigating the practical value of Cole's O (log n) time crew pram merge sort algorithm. In 5th International Symposium on Computing and Information Sciences, October 1990.

[NBF96] B. Nichols, B. Buttlar, and J. P. Farrell. Pthreads Programming. O'Reilly & Associates, Newton, MA 02164, 1996.

[nCU90] nCUBE Corporation. nCUBE 6400 Processor Manual. Beaverton, OR, 1990.

[ND96] S. J. Norton and M. D. DiPasquale. Thread time: the multithreaded programming guide. Hewlett-Packard professional books. Prentice-Hall, Englewood Cliffs, NJ 07632, 1996.

[Ni91] L. M. Ni. A layered classification of parallel computers. In Proceedings of 1991 International Conference for Young Computer Scientists, 28-33, 1991.

[Nic90] J. R. Nickolls. The design of the MasPar MP-1: A cost-effective massively parallel computer. In IEEE Digest of Papers-Comcom, 25-28. IEEE Computer Society Press, Los Alamitos, CA, 1990.

[NM93] L. M. Ni and McKinley. A survey of wormhole routing techniques in direct connect networks. IEEE Computer, 26(2), February 1993.

[NMB83] D. Nath, S. N. Maheshwari, and P. C. P. Bhatt. Efficient VLSI networks for parallel processing based on orthogonal trees . IEEE Transactions on Computers, C-32:21-23, June 1983.

[NS79] D. Nassimi and S. Sahni. Bitonic sort on a mesh connected parallel computer. IEEE Transactions on Computers, C-28(1), January 1979.

[NS80] D. Nassimi and S. Sahni. Finding connected components and connected ones on a mesh-connected computer. SIAM Journal of Computing, 9(4):744-757, November 1980.

[NS87] A. Norton and A. J. Silberger. Parallelization and performance analysis of the Cooley-Tukey FFT algorithm for shared memory architectures. IEEE Transactions on Computers, C-36(5):581-591, 1987.

[NS93] M. Nigam and S. Sahni. Sorting n numbers on n x n reconfigurable meshes with buses. In 7th International Parallel Processing Symposium, 174-181, 1993.

[NSF91] Grand Challenges: High Performance Computing and Communications. A Report by the Committee on Physical, Mathematical and Engineering Sciences, NSF/CISE, 1800 G Street NW, Washington, DC, 20550, 1991.

[Nug88] S. F. Nugent. The iPSC/2 direct-connect communications technology. In Proceedings of the Third Conference on Hypercubes, Concurrent Computers, and Applications, 51-60, 1988.

[Nus82] H. J. Nussbaumer. Fast Fourier Transform and Convolution Algorithms. Springer-Verlag, New York, NY, 1982.

[NW88] D. M. Nicol and F. H. Willard. Problem size, parallel architecture, and optimal speedup. Journal of Parallel and Distributed Computing, 5:404-420, 1988.

[GOV99] Funding a Revolution: Government Support for Computing Research. Committee on Innovations in Computing and Communications. National Academy Press, 1999.

[OR88] J. M. Ortega and C. H. Romine. The ijk forms of factorization methods II: Parallel systems. Parallel Computing, 7:149-162, 1988.

[Ort88] J. M. Ortega. Introduction to Parallel and Vector Solution of Linear Systems. Plenum Press, New York, NY, 1988.

[OS85] D. P. O'Leary and G. W. Stewart. Data-flow algorithms for parallel matrix computations. Communications of the ACM, 28:840-853, 1985.

[OS86] D. P. O'Leary and G. W. Stewart. Assignment and scheduling in parallel matrix factorization. Linear Algebra and its Applications, 77:275-299, 1986.

[Pac98] P. Pacheco. Parallel Programming with MPI. Morgan Kaufmann, 1998.

[PBG⁺85] G. F. Pfister, W. C. Brantley, D. A. George, S. L. Harvey, W. J. Kleinfelder, K. P. McAuliffe, E. A. Melton, V. A. Norlton, and J. Weiss. The IBM research parallel processor prototype (RP3): Introduction and architecture. In Proceedings of 1985 International Conference on Parallel Processing, 764- 771, 1985.

[PC89] P. M. Pardalos and J. Crouse. A parallel algorithm for the quadratic assignment problem. In Supercomputing '89 Proceedings, 351-360. ACM Press, New York, NY, 1989.

[PD89] K. H. Park and L. W. Dowdy. Dynamic partitioning of multiprocessor systems. International Journal of Parallel Processing, 18(2):91-120, 1989.

[Pea84] J. Pearl. Heuristics-Intelligent Search Strategies for Computer Problem Solving. Addison-Wesley, Reading, MA, 1984.

[Per87] R. Perrott. Parallel Programming. Addison-Wesley, Reading, MA, 1987.

[Pfi98] G. F. Pfister. In Search of Clusters. Prentice Hall, Englewood Cliffs, NJ, 1998. 2nd Edition.

[PFK90] C. Powley, C. Ferguson, and R. Korf. Parallel heuristic search: Two approaches. In V. Kumar, P. S. Gopalakrishnan, and L. N. Kanal, editors, Parallel Algorithms for Machine Intelligence and Vision. Springer-Verlag, New York, NY, 1990.

[PG98] T. Q. Pham and P. K. Garg. Multithreaded Programming with Win32. Prentice Hall, 1998.

[PH90] D. A. Patterson and J. L. Hennessy. Computer Architecture: A Quantitative Approach. Morgan Kaufmann, San Mateo, CA, 1990.

[PH96] D. A. Patterson and J. L. Hennessy. Computer Architecture: A Quantitative Approach, 2nd edition. Morgan Kaufmann, San Mateo, CA, 1996.

[PK89] R. C. Paige and C. P. Kruskal. Parallel algorithms for shortest path problems. In Proceedings of 1989 International Conference on Parallel Processing, 14- 19, 1989.

[PK95] I. Pramanick and J. G. Kuhl. An inherently parallel method for heuristic problem-solving: Part I - general framework. IEEE Transactions on Parallel and Distributed Systems, 6(10), October 1995.

[PKF92] C. Powley, R. Korf, and C. Ferguson. IDA* on the connection machine. Artificial Intelligence, 1992.

[Pla89] C. C. Plaxton. Load balancing, selection and sorting on the hypercube. In Proceedings of the 1989 ACM Symposium on Parallel Algorithms and Architectures, 64-73, 1989.

[PLRR94] P. M. Pardalos, Y. Li, K. Ramakrishna, and M. Resende. Lower bounds for the quadratic assignment problem. Annals of Operations Research, 50:387-411, 1994. Special Volume on Applications of Combinatorial Optimization.

[PR85] V. Pan and J. H. Reif. Efficient parallel solution of linear systems. In 17th Annual ACM Symposium on Theory of Computing, 143-152, 1985.

[PR89] P. M. Pardalos and G. P. Rodgers. Parallel branch-and-bound algorithms for unconstrainted quadratic zero-one programming. In R. Sharda et al., editors, Impacts of Recent Computer Advances on Operations Research, 131-143. North-Holland, Amsterdam, The Netherlands, 1989.

[PR90] P. M. Pardalos and G. P. Rodgers. Parallel branch-and-bound algorithms for quadratic zero-one programming on a hypercube architecture. Annals of Operations Research, 22:271-292, 1990.

[PR91] M. Padberg and G. Rinaldi. A branch-and-cut algorithm for the resolution of large-scale symmetric traveling salesman problems. SIAM Review, 33:60- 100, 1991.

[Pri57] R. C. Prim. Shortest connection network and some generalizations. Bell Systems Technical Journal, 36:1389-1401, 1957.

[PRV88] G. Plateau, C. Roucairol, and I. Valabregue. Algorithm PR2 for the parallel size reduction of the 0/1 multiknapsack problem. In INRIA Rapports de Recherche, number 811, 1988.

[PS82] C. H. Papadimitriou and K. Steiglitz. Combinatorial Optimization: Algorithms and Complexity. Prentice-Hall, Englewood Cliffs, NJ, 1982.

[PV80] F. P. Preparata and J. Vuillemin. Area-time optimal VLSI networks for matrix multiplication. In Proceedings of the 14th Princeton Conference on Information Science and Systems, 300-309, 1980.

[PY88] C. H. Papadimitriou and M. Yannakakis. Towards an architecture independent analysis of parallel algorithms. In Proceedings of 20th ACM Symposium on Theory of Computing, 510-513, 1988.

[QD86] M. J. Quinn and N. Deo. An upper bound for the speedup of parallel branch-and-bound algorithms. BIT, 26(1), March 1986.

[Qui88] M. J. Quinn. Parallel sorting algorithms for tightly coupled multiprocessors. Parallel Computing, 6:349-357, 1988.

[Qui89] M. J. Quinn. Analysis and implementation of branch-and-bound algorithms on a hypercube multicomputer. IEEE Transactions on Computers, 1989.

[Qui94] M. J. Quinn. Parallel Computing: Theory and Practice. McGraw-Hill, New York, NY, 1994.

[Ram97] V. Ramachandran. Qsm: A general purpose shared-memory model for parallel computation. In Foundations of Software Technology and Theoretical Computer Science, 1-5, 1997.

[Ran89] A. G. Ranade. Fluent Parallel Computation. Ph.D. Thesis, Department of Computer Science, Yale University, New Haven, CT, 1989.

[Ran91] A. G. Ranade. Optimal speedup for backtrack search on a butterfly network. In Proceedings of the Third ACM Symposium on Parallel Algorithms and Architectures, 1991.

[Rao90] V. N. Rao. Parallel Processing of Heuristic Search. Ph.D. Thesis, University of Texas, Austin, TX, 1990.

[Ras78] L. Raskin. Performance Evaluation of Multiple Processor Systems. Ph.D. Thesis, Carnegie-Mellon University, Pittsburgh, PA, 1978.

[RB76] C. M. Rader and N. M. Brenner. A new principle for Fast fourier transform. IEEE Transactions on Acoustics, Speech and Signal Processing, 24:264-265, 1976.

[RDK89] C. Renolet, M. Diamond, and J. Kimbel. Analytical and heuristic modeling of distributed algorithms. Technical Report E3646, FMC Corporation, Advanced Systems Center, Minneapolis, MN, 1989.

[Rei81] R. Reischuk. Probabilistic algorithms for sorting and selection. SIAM Journal of Computing, 396-409, 1981.

[RF89] D. A. Reed and R. M. Fujimoto. Multicomputer Networks: Message-Based Parallel Processing. MIT Press, Cambridge, MA, 1989.

[RICN88] K. Rokusawa, N. Ichiyoshi, T. Chikayama, and H. Nakashima. An efficient termination detection and abortion algorithm for distributed processing systems. In Proceedings of 1988 International Conference on Parallel Processing: Vol. I, 18-22, 1988.

[RK87] V. N. Rao and V. Kumar. Parallel depth-first search, part I: Implementation. International Journal of Parallel Programming, 16(6):479-499, 1987.

[RK88a] V. N. Rao and V. Kumar. Concurrent access of priority queues. IEEE Transactions on Computers, C-37 (12), 1988.

[RK88b] V. N. Rao and V. Kumar. Superlinear speedup in state-space search. In Proceedings of the 1988 Foundation of Software Technology and Theoretical Computer Science, number 338, 161-174. Springer-Verlag Series Lecture Notes in Computer Science, 1988.

[RK93] V. N. Rao and V. Kumar. On the efficicency of parallel backtracking. IEEE Transactions on Parallel and Distributed Systems, 4(4):427-437, April 1993. available as a technical report TR 90-55, Computer Science Department, University of Minnesota.

[RKR87] V. N. Rao, V. Kumar, and K. Ramesh. A parallel implementation of iterative-deepening-A*. In Proceedings of the National Conference on Artificial Intelligence (AAAI-87), 878-882, 1987.

[RND77] E. M. Reingold, J. Nievergelt, and N. Deo. Combinatorial Algorithms: Theory and Practice. Prentice-Hall, Englewood Cliffs, NJ, 1977.

[RO88] C. H. Romine and J. M. Ortega. Parallel solution of triangular systems of equations. Parallel Computing, 6:109-114, 1988.

[Rob75] M. O. Robin. Probabilistic algorithms. In J. Traub, editor, Algorithms and Complexity: New Directions and Recent Results, 21-39. Academic Press, San Diego, CA, 1975.

[Rob90] Y. Robert. The Impact of Vector and Parallel Architectures on Gaussian Elimination. John Wiley and Sons, New York, NY, 1990.

[Rom87] C. H. Romine. The parallel solution of triangular systems on a hypercube. In M. T. Heath, editor, Hypercube Multiprocessors 1987, 552-559. SIAM, Philadelphia, PA, 1987.

[Rou87] C. Roucairol. A parallel branch-and-bound algorithm for the quadratic assignment problem. Discrete Applied Mathematics, 18:211-225, 1987.

[Rou91] C. Roucairol. Parallel branch-and-bound on shared-memory multiprocessors. In Proceedings of the Workshop On Parallel Computing of Discrete Optimization Problems, 1991.

[RRRR96] K. A. Robbins, S. Robbins, K. R. Robbins, and S. Robbins. Practical UNIX Programming: A Guide to Concurrency, Communication, and Multithreading. Prentice Hall, 1996.

[RS90a] A. P. R. Alverson, D. Callahan, D. Cummings, B. Koblenz and B. Smith. The tera computer system. In International Conference on Supercomputing, 1-6, 1990.

[RS90b] S. Ranka and S. Sahni. Hypercube Algorithms for Image Processing and Pattern Recognition. Springer-Verlag, New York, NY, 1990.

[RV87] J. H. Reif and L. G. Valiant. A logarithmic time sort for linear size networks. Journal of the ACM, 34(1):60-76, January 1987.

[Ryt88] W. Rytter. Efficient parallel computations for dynamic programming. Theoretical Computer Science, 59:297-307, 1988.

[RZ89] M. Reeve and S. E. Zenith, editors. Parallel Processing and Artificial Intelligence. Wiley, Chichester, UK, 1989.

[Saa86] Y. Saad. Communication complexity of the Gaussian elimination algorithm on multiprocessors. Linear Algebra and its Applications, 77:315-340, 1986.

[SB77] H. Sullivan and T. R. Bashkow. A large scale, homogeneous, fully distributed parallel machine. In Proceedings of Fourth Symposium on Computer Architecture, 105-124, March 1977.

[SBCV90] R. H. Saavedra-Barrera, D. E. Culler, and T. Von Eiken. Analysis of multithreaded architectures for parallel computing. Report UCB/CSD 90/569, University of California, Berkeley, Computer Science Division, Berkeley, CA, April 1990.

[Sch80] J. T. Schwartz. Ultracomputers. ACM Transactions on Programming Languages and Systems, 2:484-521, October 1980.

[Sed78] R. Sedgewick. Implementing quicksort programs. Communications of the ACM, 21(10):847-857, 1978.

[Sei85] C. L. Seitz. The cosmic cube. Communications of the ACM, 28(1):22-33, 1985.

[Sei89] S. R. Seidel. Circuit-switched vs. store-and-forward solutions to symmetric communication problems. In Proceedings of the Fourth Conference on Hypercubes, Concurrent Computers, and Applications, 253-255, 1989.

[Sei92] C. L. Seitz. Mosaic C: An experimental fine-grain multicomputer. Technical report, California Institute of Technology, Pasadena, CA, 1992.

[SG88] S. R. Seidel and W. L. George. Binsorting on hypercube with d-port communication. In Proceedings of the Third Conference on Hypercube Concurrent Computers, 1455-1461, January 1988.

[SG91] X.-H. Sun and J. L. Gustafson. Toward a better parallel performance metric. Parallel Computing, 17:1093-1109, December 1991. Also available as Technical Report IS-5053, UC-32, Ames Laboratory, Iowa State University, Ames, IA.

[Sha85] J. A. Sharp. Data-Flow Computing. Ellis Horwood, Chichester, UK, 1985.

[She59] D. L. Shell. A high-speed sorting procedure. Communications of the ACM, 2(7):30-32, July 1959.

[SHG93] J. P. Singh, J. L. Hennessy, and A. Gupta. Scaling parallel programs for multiprocessors: Methodology and examples. IEEE Computer, 26(7):42-50, 1993.

[Sie77] H. J. Siegel. The universality of various types of SIMD machine interconnection networks. In Proceedings of the 4th Annual Symposium on Computer Architecture , 23-25, 1977.

[Sie85] H. J. Siegel. Interconnection Networks for Large-Scale Parallel Processing. D. C. Heath, Lexington, MA, 1985.

[SJ81] C. Savage and J. Jaja. Fast, efficient parallel algorithms for some graph problems. SIAM Journal of Computing, 10(4):682-690, November 1981.

[SK89] W. Shu and L. V. Kale. A dynamic scheduling strategy for the chare-kernel system. In Proceedings of Supercomputing Conference, 389-398, 1989.

[SK90] V. Saletore and L. V. Kale. Consistent linear speedup to a first solution in parallel state-space search. In Proceedings of the 1990 National Conference on Artificial Intelligence, 227-233, August 1990.

[SKAT91a] V. Singh, V. Kumar, G. Agha, and C. Tomlinson. Efficient algorithms for parallel sorting on mesh multicomputers. International Journal of Parallel Programming, 20(2):95-131, 1991.

[SKAT91b] V. Singh, V. Kumar, G. Agha, and C. Tomlinson. Scalability of parallel sorting on mesh multicomputers. International Journal of Parallel Programming, 20(2), 1991.

[SM86] S. J. Stolfo and D. P. Miranker. The DADO production system machine. Journal of Parallel and Distributed Computing, 3:269-296, June 1986.

[Smi84] D. R. Smith. Random trees and the analysis of branch and bound procedures. Journal of the ACM, 31(1), 1984.

[SN90] X.-H. Sun and L. M. Ni. Another view of parallel speedup. In Supercomputing '90 Proceedings, 324-333, 1990.

[SN93] X.-H. Sun and L. M. Ni. Scalable problems and memory-bounded speedup. Journal of Parallel and Distributed Computing, 19:27-37, September 1993.

[Sni82] M. Snir. On parallel search. In Proceedings of Principles of Distributed Computing, 242-253, 1982.

[Sni85] M. Snir. On parallel searching. SIAM Journal of Computing, 14(3):688-708, August 1985.

[Sny86] L. Snyder. Type architectures, shared-memory and the corollary of modest potential. Annual Review of Computer Science, 1:289-317, 1986.

[SOHL⁺96] M. Snir, S. W. Otto, S. Huss-Lederman, D. W. Walker, and J. Dongarra. MPI: The Complete Reference. MIT Press, Cambridge, MA, 1996.

[Sol77] M. Sollin. An algorithm attributed to Sollin. In S. Goodman and S. Hedetniemi, editors, Introduction to The Design and Analysis of Algorithms. McGraw-Hill, Cambridge, MA, 1977.

[SR91] X.-H. Sun and D. T. Rover. Scalability of parallel algorithm-machine combinations. Technical Report IS-5057, Ames Laboratory, Iowa State University, Ames, IA, 1991. Also published in IEEE Transactions on Parallel and Distributed Systems.

[SS88] Y. Saad and M. H. Schultz. Topological properties of hypercubes. IEEE Transactions on Computers, 37:867-872, 1988.

[SS89a] Y. Saad and M. H. Schultz. Data communication in hypercubes. Journal of Parallel and Distributed Computing, 6:115-135, 1989. Also available as Technical Report YALEU/DCS/RR-428 from the Department of Computer Science, Yale University, New Haven, CT.

[SS89b] Y. Saad and M. H. Schultz. Data communication in parallel architectures. Parallel Computing, 11:131-150, 1989.

[SS90] H. Shi and J. Schaeffer. Parallel sorting by regular sampling. Journal of Parallel and Distributed Computing, 14:361-372, 1990.

[ST95] B. M. S. Tschvke, R. Lling. Solving the traveling salesman problem with a distributed branch-and-bound algorithm on a 1024 processor network. In Proceedings of the 9th International Parallel Processing Symposium, 182- 189, Santa Barbara, CA, April 1995.

[Sto71] H. S. Stone. Parallel processing with the perfect shuffle. IEEE Transactions on Computers, C-20(2):153-161, 1971.

[Sto93] H. S. Stone. High-Performance Computer Architecture: Third Edition. Addison-Wesley, Reading, MA, 1993.

[Sun95] SunSoft. Solaris multithreaded programming guide. SunSoft Press, Mountainview, CA, 1995.

[Sup91] Supercomputer Systems Division, Intel Corporation. Paragon XP/S Product Overview. Beaverton, OR, 1991.

[SV81] Y. Shiloach and U. Vishkin. Finding the maximum, merging and sorting in a parallel computation model. Journal of Algorithms, 88-102, 1981.

[SV82] Y. Shiloach and U. Vishkin. An O (n² log n) parallel max-flow algorithm. Journal of Algorithms, 3:128-146, 1982.

[SW87] Q. F. Stout and B. A. Wagar. Passing messages in link-bound hypercubes. In M. T. Heath, editor, Hypercube Multiprocessors 1987, 251-257. SIAM, Philadelphia, PA, 1987.

[Swa87] P. N. Swarztrauber. Multiprocessor FFTs. Parallel Computing, 5:197-210, 1987.

[SZ96] X.-H. Sun and J. P. Zhu. Performance considerations of shared virtual memory machines. IEEE Trans. on Parallel and Distributed Systems, 6(11):1185- 1194, 1996.

[Tab90] D. Tabak. Multiprocessors. Prentice-Hall, Englewood Cliffs, NJ, 1990.

[Tab91] D. Tabak. Advanced Multiprocessors. McGraw-Hill, New York, NY, 1991.

[Tak87] R. Take. An optimal routing method of all-to-all communication on hypercube networks. In 35th Information Processing Society of Japan, 1987.

[TEL95] D. M. Tullsen, S. Eggers, and H. M. Levy. Simultaneous multithreading: Maximizing on-chip parallelism. In Proceedings of the 22nd Annual International Symposium on Computer Architecture, 1995.

[Ten90] S. Teng. Adaptive parallel algorithms for integral knapsack problems. Journal of Parallel and Distributed Computing, 8:400-406, 1990.

[Thi90] Thinking Machines Corporation. The CM-2 Technical Summary. Cambridge, MA. 1990.

[Thi91] Thinking Machines Corporation. The CM-5 Technical Summary. Cambridge, MA. 1991.

[Tho83] C. D. Thompson. Fourier transforms in VLSI. IBM Journal of Research and Development, C-32(11):1047-1057, 1983.

[Thr99] J. Throop. Standards: OpenMP: Shared-memory parallelism from the ashes. Computer, 32(5):108-109, May 1999.

[Tic88] W. F. Tichy. Parallel matrix multiplication on the connection machine. Technical Report RIACS TR 88.41, Research Institute for Advanced Computer Science, NASA Ames Research Center, Moffet Field, CA, 1988.

[TK77] C. D. Thompson and H. T. Kung. Sorting on a mesh-connected parallel computer. Communications of the ACM, 21:263-271, 1977.

[TL90] Z. Tang and G.-J. Li. Optimal granularity of grid iteration problems. In Proceedings of the 1990 International Conference on Parallel Processing, I111-I118, 1990.

[Tul96] D. M. Tullsen. Simultaneous multithreading. Ph.D. Thesis, University of Washington, Seattle, WA, 1996.

[TV85] R. E. Tarjan and U. Vishkin. An efficient parallel biconnectivity algorithm. SIAM Journal on Computing, 14(4):862-874, November 1985.

[TY96] J.-Y. Tsai and P.-C. Yew. The superthreaded architecture: Thread pipelining with run-time data dependence checking and control speculation. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 35-46, 1996.

[Upf84] E. Upfal. A probabilistic relation between desirable and feasible models of parallel computation. In Proceedings of the 16th ACM Conference on Theory of Computing, 258-265, 1984.

[UW84] E. Upfal and A. Widgerson. How to share memory in a distributed system. In Proceedings of the 25th Annual Symposium on the Foundation of Computer Science, 171-180, 1984.

[Val75] L. G. Valiant. Parallelism in comparison problems. SIAM Journal of Computing, 4(3):348-355, September 1975.

[Val82] L. G. Valiant. A scheme for fast parallel communication. SIAM Journal on Computing, 11:350-361, 1982.

[Val90a] L. G. Valiant. A bridging model for parallel computation. Communications of the ACM, 33(8), 1990.

[Val90b] L. G. Valiant. General purpose parallel architectures. Handbook of Theoretical Computer Science, 1990.

[Vav89] S. A. Vavasis. Gaussian elimination with pivoting is P-complete. SIAM Journal on Discrete Mathematics, 2:413-423, 1989.

[VB81] L. G. Valiant and G. J. Brebner. Universal schemes for parallel communication. In Proceedings of the 13th ACM Symposium on Theory of Computation, 263-277, 1981.

[VC89] F. A. Van-Catledge. Towards a general model for evaluating the relative performance of computer systems. International Journal of Supercomputer Applications, 3(2):100-108, 1989.

[Vor86] O. Vornberger. Implementing branch-and-bound in a ring of processors. Technical Report 29, University of Paderborn, Germany, 1986.

[Vor87a] O. Vornberger. The personal supercomputer: A network of transputers. In Proceedings of the 1987 International Conference on Supercomputing, 1987.

[Vor87b] O. Vornberger. Load balancing in a network of transputers. In Proceedings of the Second International Workshop on Distributed Parallel Algorithms, 1987.

[VS86] J. S. Vitter and R. A. Simmons. New classes for parallel complexity: A study of unification and other complete problems for P. IEEE Transactions on Computers, May 1986.

[VSBR83] L. G. Valiant, S. Skyum, S. Berkowitz, and C. Rackoff. Fast parallel computation of polynomials using few processors. SIAM Journal of Computing, 12(4):641-644, 1983.

[WA98] B. Wilkinson and C. M. Allen. Parallel Programming: Techniques and Applications Using Networked Workstations and Parallel Computers. Prentice Hall, 1998.

[WA99] B. Wilkinson and M. Allen. Parallel Programming. Prentice-Hall, NJ, 1999.

[Wag87] B. A. Wagar. Hyperquicksort: A fast sorting algorithm for hypercubes. In Proceedings of the Second Conference on Hypercube Multiprocessors, 292- 299, 1987.

[Wal91] Y. Wallach. Parallel Processing and Ada. Prentice-Hall, Englewood Cliffs, NJ, 1991.

[WB72] W. A. Wulf and C. G. Bell. C.mmp-a multimicroprocessor. In Proceedings of AFIPS Conference, 765-777, 1972.

[Wei97] B. Weissman. Active threads manual. Technical Report TR-97-037, International Computer Science Institute, Berkeley, CA 94704, 1997.

[WF80] C. L. Wu and T. Y. Feng. On a class of multistage interconnection networks. IEEE Transactions on Computers, 669-702, August 1980.

[WF84] C. L. Wu and T. Y. Feng. Interconnection Networks for Parallel and Distributed Processing. IEEE Computer Society Press, Washington, DC, 1984.

[WI89] K. Wada and N. Ichiyoshi. A distributed shortest path algorithm and its mapping on the Multi-PSI. In Proceedings of International Conference of Parallel Processing, 1989.

[Wil86] H. S. Wilf. Algorithms and Complexity. Prentice-Hall, NJ, 1986.

[Wil95] G. V. Wilson. Practical Parallel Programming. MIT Press, Cambridge, MA, 1995.

[Wil00] A. Williams. Windows 2000 Systems Programming Black Book. The Coriolis Group, 2000.

[Win77] S. Winograd. A new method for computing DFT. In IEEE International Conference on Acoustics, Speech and Signal Processing, 366-368, 1977.

[WL88] B. W. Wah and G.-J. Li. Systolic processing for dynamic programming problems. Circuits, Systems, and Signal Processing, 7(2):119-149, 1988.

[WLY84] B. W. Wah, G.-J. Li, and C. F. Yu. The status of MANIP-a multicomputer architecture for solving combinatorial extremum-search problems. In Proceedings of 11th Annual International Symposium on Computer Architecture, 56-63, 1984.

[WM84] B. W. Wah and Y. W. E. Ma. MANIP-a multicomputer architecture for solving combinatorial extremum-search problems. IEEE Transactions on Computers, C-33, May 1984.

[Woo86] J. V. Woods, editor. Fifth Generation Computer Architectures. North-Holland, Amsterdam, The Netherlands, 1986.

[Wor88] P. H. Worley. Information Requirements and the Implications for Parallel Computation. Ph.D. Thesis, Stanford University, Department of Computer Science, Palo Alto, CA, 1988.

[Wor90] P. H. Worley. The effect of time constraints on scaled speedup. SIAM Journal on Scientific and Statistical Computing, 11(5):838-858, 1990.

[Wor91] P. H. Worley. Limits on parallelism in the numerical solution of linear PDEs. SIAM Journal on Scientific and Statistical Computing, 12:1-35, January 1991.

[WS88] Y. Won and S. Sahni. A balanced bin sort for hypercube multiprocessors. Journal of Supercomputing, 2:435-448, 1988.

[WS89] J. Woo and S. Sahni. Hypercube computing: Connected components. Journal of Supercomputing, 3:209-234, 1989.

[WS91] J. Woo and S. Sahni. Computing biconnected components on a hypercube. Journal of Supercomputing, June 1991. Also available as Technical Report TR 89-7 from the Department of Computer Science, University of Minnesota, Minneapolis, MN.

[WY85] B. W. Wah and C. F. Yu. Stochastic modeling of branch-and-bound algorithms with best-first search. IEEE Transactions on Software Engineering, SE-11, September 1985.

[Zho89] X. Zhou. Bridging the gap between Amdahl's law and Sandia laboratory's result. Communications of the ACM, 32(8):1014-5, 1989.

[Zom96] A. Zomaya, editor. Parallel and Distributed Computing Handbook. McGraw-Hill, 1996.

[ZRV89] J. R. Zorbas, D. J. Reble, and R. E. VanKooten. Measuring the scalability of parallel computer systems. In Supercomputing '89 Proceedings, 832-841, 1989.