Previous Section Table of Contents Next Section

Organization

This book is composed of 17 chapters, divided into four parts. The first part addresses background material; the second part deals with getting a cluster running quickly; the third part goes into more depth describing how a custom cluster can be built; and the fourth part introduces cluster programming.

Depending on your background and goals, different parts of this book are likely to be of interest. I have tried to provide information here and at the beginning of each section that should help you in selecting those parts of greatest interest. You should not need to read the entire book for it to be useful.


Part I, An Introduction to Clusters

Chapter 1, is a general introduction to high-performance computing from the perspective of clusters. It introduces basic terminology and provides a description of various high-performance technologies. It gives a broad overview of the different cluster architectures and discusses some of the inherent limitations of clusters.

Chapter 2, begins with a discussion of how to determine what you want your cluster to do. It then gives a quick overview of the different types of software you may need in your cluster.

Chapter 3, is a discussion of the hardware that goes into a cluster, including both the individual computers and network equipment.

Chapter 4, begins with a brief discussion of Linux in general. The bulk of the chapter covers the basics of installing and configuring Linux. This chapter assumes you are comfortable using Linux but may need a quick review of some administrative tasks.


Part II, Getting Started Quickly

Chapter 5, describes the installation, configuration, and use of openMosix. It also reviews how to recompile a Linux kernel.

Chapter 6, describes installing and setting up OSCAR. It also covers a few of the basics of using OSCAR.

Chapter 7, describes installing Rocks. It also covers a few of the basics of using Rocks.


Part III, Building Custom Clusters

Chapter 8, describes tools you can use to replicate the software installed on one machine onto others. Thus, once you have decided how to install and configure the software on an individual node in your cluster, this chapter will show you how to duplicate that installation on a number of machines quickly and efficiently.

Chapter 9, first describes programming software that you may want to consider. Next, it describes the installation and configuration of the software, along with additional utilities you'll need if you plan to write the application programs that will run on your cluster.

Chapter 10, describes tools you can use to manage your cluster. Once you have a working cluster, you face numerous administrative tasks, not the least of which is insuring that the machines in your cluster are running properly and configured identically. The tools in this chapter can make life much easier.

Chapter 11, describes OpenPBS, open source scheduling software. For heavily loaded clusters, you'll need software to allocate resources, schedule jobs, and enforce priorities. OpenPBS is one solution.

Chapter 12, describes setting up and configuring the Parallel Virtual File System (PVFS) software, a high-performance parallel file system for clusters.


Part IV, Cluster Programming

Chapter 13, is a tutorial on how to use the MPI library. It covers the basics. There is a lot more to MPI than what is described in this book, but that's a topic for another book or two. The material in this chapter will get you started.

Chapter 14, describes some of the more advanced features of MPI. The intent is not to make you proficient with any of these features but simply to let you know that they exist and how they might be useful.

Chapter 15, describes some techniques to break a program into pieces that can be run in parallel. There is no silver bullet for parallel programming, but there are several helpful ways to get started. The chapter is a quick overview.

Chapter 16, first reviews the techniques used to debug serial programs and then shows how the more traditional approaches can be extended and used to debug parallel programs. It also discusses a few problems that are unique to parallel programs.

Chapter 17, looks at techniques and tools that can be used to profile parallel programs. If you want to improve the performance of a parallel program, the first step is to find out where the program is spending its time. This chapter shows you how to get started.


Part V, Appendix

The Appendix includes source information and documentation for the software discussed in the book. It also includes pointers to other useful information about clusters.

    Previous Section Table of Contents Next Section