2.6 Benchmarks

Once you have your cluster running, you'll probably want to run a benchmark or two just to see how well it performs. Unfortunately, benchmarking is, at best, a dark art. In practice, sheep entrails may give better results.

Often the motivation for benchmarks is hubris-the desire to prove your system is the best. This can be crucial if funding is involved, but otherwise is probably a meaningless activity and a waste of time. You'll have to judge for yourself.

Keep in mind that a benchmark supplies a single set of numbers that is very difficult to interpret in isolation. Benchmarks are mostly useful when making comparisons between two or more closely related configurations on your own cluster.

There are at least three reasons you might run benchmarks. First, a benchmark will provide you with a baseline. If you make changes to your cluster or if you suspect problems with your cluster, you can rerun the benchmark to see if performance is really any different. Second, benchmarks are useful when comparing systems or cluster configurations. They can provide a reasonable basis for selecting between alternatives. Finally, benchmarks can be helpful with planning. If you can run several with differently sized clusters, etc., you should be able to make better estimates of the impact of scaling your cluster.

Benchmarks are not infallible. Consider the following rather simplistic example: Suppose you are comparing two clusters with the goal of estimating how well a particular cluster design scales. Cluster B is twice the size of cluster A. Your goal is to project the overall performance for a new cluster C, which is twice the size of B. If you rely on a simple linear extrapolation based on the overall performance of A and B, you could be grossly misled. For instance, if cluster A has a 30% network utilization and cluster B has a 60% network utilization, the network shouldn't have a telling impact on overall performance for either cluster. But if the trend continues, you'll have a difficult time meeting cluster C's need for 120% network utilization.

There are several things to keep in mind when selecting benchmarks. A variety of different things affect the overall performance of a cluster, including the configuration of the individual systems and the network, the job mix on the cluster, and the instruction mix in the cluster applications. Benchmarks attempt to characterize performance by measuring, in some sense, the performance of CPU, memory, or communications. Thus, there is no exact correspondence between what may affect a cluster's performance and what a benchmark actually measures.

Furthermore, since several factors are involved, different benchmarks may weight different factors. Thus, it is generally meaningless to compare the results of one benchmark on one system with a different set of benchmarks on a different system, even when the benchmarks reputedly measure the same thing.

When you select a benchmark, first decide why you need it and how it will be used. For many purposes, the best benchmark is the actual applications that you will run on your cluster. It doesn't matter how well your cluster does with memory benchmarks if your applications are constantly thrashing. The primary difficulty in using actual applications is running them in a consistent manner so that you have repeatable results. This can be a real bear! Even small changes in data can produce significant changes in performance. If you do decide to use your applications, be consistent.

If you don't want to use your applications, there are a number of cluster benchmarks available. Here are a few that you might consider:

Hierarchical Integration (HINT): The HINT benchmark, developed at the U.S. Department of Energy's Ames Research Laboratory, is used to test subsystem performance. It can be used to compare both processor performance and memory subsystem performance. It is now supported by Brigham Young University. (http://hint.byu.edu)
High Performance Linpack: Linpack was written by Jack Dongarra and is probably the best known and most widely used benchmark in high-performance computing. The HPL version of Linpack is used to rank computers on the TOP500 Supercomputer Site. HPL differs from its predecessor in that the user can specify the problem size. (http://www.netlib.org/benchmark/hpl/)
Iozone: Iozone is an I/O and filesystem benchmark tool. It generates and performs a variety of file operations and can be used to access filesystem performance. (http://www.iozone.org)
Iperf: Iperf was developed to measure network performance. It measures TCP and UDP bandwidth performance, reporting delay jitter and datagram loss as well as bandwidth. (http://dast.nlanr.net/Projects/Iperf/)
NAS Parallel Benchmarks: The Numerical Aerodynamic Simulation (NAS) Parallel Benchmarks (NPB) are application-centric benchmarks that have been widely used to compare the performance of parallel computers. NPB is actually a suite of eight programs. (http://science.nas.nasa.gov/Software/NPB/)

There are many other benchmarks available. The Netlib Repository is a good place to start if you need additional benchmarks, http://www.netlib.org.

Table of Contents