Previous Section Table of Contents Next Section

1.3 Distributed Computing and Clusters

While the term parallel is often used to describe clusters, they are more correctly described as a type of distributed computing. Typically, the term parallel computing refers to tightly coupled sets of computation. Distributed computing is usually used to describe computing that spans multiple machines or multiple locations. When several pieces of data are being processed simultaneously in the same CPU, this might be called a parallel computation, but would never be described as a distributed computation. Multiple CPUs within a single enclosure might be used for parallel computing, but would not be an example of distributed computing. When talking about systems of computers, the term parallel usually implies a homogenous collection of computers, while distributed computing typically implies a more heterogeneous collection. Computations that are done asynchronously are more likely to be called distributed than parallel. Clearly, the terms parallel and distributed lie at either end of a continuum of possible meanings. In any given instance, the exact meanings depend upon the context. The distinction is more one of connotations than of clearly established usage.

Since cluster computing is just one type of distributed computing, it is worth briefly mentioning the alternatives. The primary distinction between clusters and other forms of distributed computing is the scope of the interconnecting network and the degree of coupling among the individual machines. The differences are often ones of degree.

Clusters are generally restricted to computers on the same subnetwork or LAN. The term grid computing is frequently used to describe computers working together across a WAN or the Internet. The idea behind the term "grid" is to invoke a comparison between a power grid and a computational grid. A computational grid is a collection of computers that provide computing power as a commodity. This is an active area of research and has received (deservedly) a lot of attention from the National Science Foundation. The most significant differences between cluster computing and grid computing are that computing grids typically have a much larger scale, tend to be used more asynchronously, and have much greater access, authorization, accounting, and security concerns. From an administrative standpoint, if you build a grid, plan on spending a lot of time dealing with security-related issues. Grid computing has the potential of providing considerably more computing power than individual clusters since a grid may combine a large number of clusters.

Peer-to-peer computing provides yet another approach to distributed computing. Again this is an ambiguous term. Peer-to-peer may refer to sharing cycles, to the communications infrastructure, or to the actual data distributed across a WAN or the Internet. Peer-to-peer cycle sharing is best exemplified by SETI@Home, a project to analyze radio telescope data for signs of extraterrestrial intelligence. Volunteers load software onto their Internet-connected computers. To the casual PC or Mac user, the software looks like a screensaver. When a computer becomes idle, the screensaver comes on and the computer begins analyzing the data. If the user begins using the computer again, the screensaver closes and the data analysis is suspended. This approach has served as a model for other research, including the analysis of cancer and AIDS data.

Data or file-sharing peer-to-peer networks are best exemplified by Napster, Gnutella, or Kazaa technologies. With some peer-to-peer file-sharing schemes, cycles may also be provided for distributed computations. That is, by signing up and installing the software for some services, you may be providing idle cycles to the service for other uses beyond file sharing. Be sure you read the license before you install the software if you don't want your computers used in this way.

Other entries in the distributed computing taxonomy include federated clusters and constellations. Federated clusters are clusters of clusters, while constellations are clusters where the number of CPUs is greater than the number of nodes. A four-node cluster of SGI Altrix computers with 128 CPUs per node is a constellation. Peer-to-peer, grids, federated clusters, and constellations are outside the scope of this book.

    Previous Section Table of Contents Next Section