11.1 OpenPBS

Before the emergence of clusters, the Unix-based Network Queuing System (NQS) from NASA Ames Research Center was a commonly used batch-queuing system. With the emergence of parallel distributed system, NQS began to show its limitations. Consequently, Ames led an effort to develop requirements and specifications for a newer, cluster-compatible system. These requirements and specifications later became the basis for the IEEE 1003.2d POSIX standard. With NASA funding, PBS, a system conforming to those standards, was developed by Veridian in the early 1990s.

PBS is available in two forms-OpenPBS or PBSPro. OpenPBS is the unsupported original open source version of PBS, while PBSPro is a newer commercial product. In 2003, PBSPro was acquired by Altair Engineering and is now marketed by Altair Grid Technologies, a subsidiary of Altair Engineering. The web site for OpenPBS is http://www.openpbs.org; the web site for PBSPro is http://www.pbspro.com. Although much of the following will also apply to PBSPro, the remainder of this chapter describes OpenPBS, which is often referred to simply as PBS. However, if you have the resources to purchase software, it is well worth looking into PBSPro. Academic grants have been available in the past, so if you are eligible, this is worth looking into as well.

As an unsupported product, OpenPBS has its problems. Of the software described in this book, it was, for me, the most difficult to install. In my opinion, it is easier to install OSCAR, which has OpenPBS as a component, or Rocks along with the PBS roll than it is to install just OpenPBS. With this warning in mind, we'll look at a typical installation later in this chapter.

11.1.1 Architecture

Before we install PBS, it is helpful to describe its architecture. PBS uses a client-server model and is organized as a set of user-level commands that interact with three system-level daemons. Jobs are submitted using the user-level commands and managed by the daemons. PBS also includes an API.

The pbs_server daemon, the job server, runs on the server system and is the heart of the PBS system. It provides basic batch services such as receiving and creating batch jobs, modifying the jobs, protecting jobs against crashes, and running the batch jobs. User commands and the other daemons communicate with the pbs_server over the network using TCP. The user commands need not be installed on the server.

The job server manages one or more queues. (Despite the name, queues are not restricted to first-in, first-out scheduling.) A scheduled job waiting to be run or a job that is actually running is said to be a member of its queue. The job server supports two types of queues, execution and routing. A job in an execution queue is waiting to execute while a job in a routing queue is waiting to be routed to a new destination for execution.

The pbs_mom daemon executes the individual batch jobs. This job executor daemon is often called the MOM because it is the "mother" of all executing jobs and must run on every system within the cluster. It creates an execution environment that is as nearly identical to the user's session as possible. MOM is also responsible for returning the job's output to the user.

The final daemon, pbs_sched, implements the cluster's job-scheduling policy. As such, it communicates with the pbs_server and pbs_mom daemons to match available jobs with available resources. By default, a first-in, first-out scheduling policy is used, but you are free to set your own policies. The scheduler is highly extensible.

PBS provides both a GUI interface as well as 1003.2d-compliant command-line utilities. These commands fall into three categories: management, operator, and user commands. Management and operator commands are usually restricted commands. The commands are used to submit, modify, delete, and monitor batch jobs.

11.1.2 Installing OpenPBS

While detailed installation directions can be found in the PBS Administrator Guide, there are enough "gotchas" that it is worth going over the process in some detail. Before you begin, be sure you look over the Administrator Guide as well. Between the guide and this chapter, you should be able to overcome most obstacles.

Before starting with the installation proper, there are a couple of things you need to check. As noted, PBS provides both command-line utilities and a graphical interface. The graphical interface requires Tcl/Tk 8.0 or later, so if you want to use it, make sure Tcl/Tk is installed. You'll want to install Tcl/Tk before you install PBS. For a Red Hat installation, you can install Tcl/Tk from the packages supplied with the operating system. For more information on Tcl/Tk, visit the web site http://www.scriptics.com/. In order to build the GUI, you'll also need the X11 development packages, which Red Hat users can install from the supplied RPMs.

The first step in the installation proper is to download the software. Go to the OpenPBS web site (http://www-unix.mcs.anl.gov/openpbs/) and follow the links to the download page. The first time through, you will be redirected to a registration page. With registration, you will receive by email an account name and password that you can use to access the actual download page. Since you have to wait for approval before you receive the account information, you'll want to plan ahead and register a couple of days before you plan to download and install the software. Making your way through the registration process is a little annoying because it keeps pushing the commercial product, but it is straightforward and won't take more than a few minutes.

Once you reach the download page, you'll have the choice of downloading a pair of RPMs or the patched source code. The first RPM contains the full PBS distribution and is used to set up the server, and the second contains just the software needed by the client and is used to set up compute nodes within a cluster. While RPMs might seem the easiest way to go, the available RPMs are based on an older version of Tcl/Tk (Version 8.0). So unless you want to backpedal-i.e., track down and install these older packages, a nontrivial task-installing the source is preferable. That's what's described here.

Download the source and move it to your directory of choice. With a typical installation, you'll end up with three directory trees-the source tree, the installation tree, and the working directory tree. In this example, I'm setting up the source tree in the directory /usr/local/src. Once you have the source package where you want it, unpack the code.

[root@fanny src]# gunzip OpenPBS_2_3_16.tar.gz

[root@fanny src]# tar -vxpf OpenPBS_2_3_16.tar

When untarring the package, use the -p option to preserve permissions bits.

Since the OpenPBS code is no longer supported, it is somewhat brittle. Before you can compile the code, you will need to apply some patches. What you install will depend on your configuration, so plan to spend some time on the Internet: the OpenPBS URL given above is a good place to start. For Red Hat Linux 9.0, start by downloading the scaling patch from http://www-unix.mcs.anl.gov/openpbs/ and the errno and gcc patches from http://bellatrix.pcl.ox.ac.uk/~ben/pbs/. (Working out the details of what you need is the annoying side of installing OpenPBS.) Once you have the patches you want, install them.

[root@fanny src]# cp openpbs-gcc32.patch /usr/local/src/OpenPBS_2_3_16/

[root@fanny src]# cp openpbs-errno.patch /usr/local/src/OpenPBS_2_3_16/

[root@fanny src]# cp ncsa_scaling.patch /usr/local/src/OpenPBS_2_3_16/

[root@fanny src]# cd /usr/local/src/OpenPBS_2_3_16/

[root@fanny OpenPBS_2_3_16]# patch -p1 -b < openpbs-gcc32.patch

patching file buildutils/exclude_script

[root@fanny OpenPBS_2_3_16]# patch -p1 -b < openpbs-errno.patch

patching file src/lib/Liblog/pbs_log.c

patching file src/scheduler.basl/af_resmom.c

[root@fanny OpenPBS_2_3_16]# patch -p1 -b < ncsa_scaling.patch

patching file src/include/acct.h

patching file src/include/cmds.h

patching file src/include/pbs_ifl.h

patching file src/include/qmgr.h

patching file src/include/server_limits.h

The scaling patch changes built-in limits that prevent OpenPBS from working with larger clusters. The other patches correct problems resulting from recent changes to the gcc complier.^[1]

^[1] Even with the patches, I found it necessary to manually edit the file srv_connect.c, adding the line #include <error.h> with the other #include lines in the file. If you have this problem, you'll know because make will fail when referencing this file. Just add the line and remake the file.

As noted, you'll want to keep the installation directory separate from the source tree, so create a new directory for PBS. /usr/local/OpenPBS is a likely choice. Change to this directory and run configure, make, make install, and make clean from it.

[root@fanny src]# mkdir /usr/local/OpenPBS

[root@fanny src]# cd /usr/local/OpenPBS

[root@fanny OpenPBS]# /usr/local/src/OpenPBS_2_3_16/configure \

> --set-default-server=fanny --enable-docs --with-scp

...

[root@fanny OpenPBS]# cd /usr/local/src/OpenPBS_2_3_16/

[root@fanny OpenPBS-2.3.16]# make

...

[root@fanny OpenPBS-2.3.16]# /usr/local/src/OpenPBS

[root@fanny OpenPBS]# make install

...

[root@fanny OpenPBS]# make clean

...

In this example, the configuration options set fanny as the server, create the documentation, and use scp (SSH secure copy program) when moving files between remote hosts. Normally, you'll create the documentation only on the server. The Administrator Guide contains several pages of additional options.

By default, the procedure builds all the software. For the compute nodes, this really isn't necessary since all you need is pbs_mom on these machines. Thus, there are several alternatives that you might want to consider when setting up the clients. You could just go ahead and build everything like you did for the server, or you could use different build options to restrict what is built. For example, the option --disable-server prevents the pbs_server daemon from being built. Or you could build and then install just pbs_mom and the files it needs. To do this, change to the MOM subdirectory, in this example /usr/local/OpenPBS/src/resmom, and run make install to install just MOM.

[root@ida OpenPBS]# cd /usr/local/OpenPBS/src/resmom

[root@ida resmom]# make install

...

Yet another possibility is to use NFS to mount the appropriate directories on the client machines. The Administrator Guide outlines these alternatives but doesn't provide many details. Whatever your approach, you'll need pbs_mom on every compute node.

The make install step will create the /usr/spool/PBS working directory, and will install the user commands in /usr/local/bin and the daemons and administrative commands in /usr/local/sbin. make clean removes unneeded files.

11.1.3 Configuring PBS

Before you can use PBS, you'll need to create or edit the appropriate configuration files, located in the working directory, e.g., /usr/spool/PBS, or its subdirectories. First, the server needs the node file, a file listing the machines it will communicate with. This file provides the list of nodes used at startup. (This list can be altered dynamically with the qmgr command.) In the subdirectory server_priv, create the file nodes with the editor of your choice. The nodes file should have one entry per line with the names of the machines in your cluster. (This file can contain additional information, but this is enough to get you started.) If this file does not exist, the server will know only about itself.

MOM will need the configuration file config, located in the subdirectory mom_priv. At a minimum, you need an entry to start logging and an entry to identity the server to MOM. For example, your file might look something like this:

$logevent 0x1ff

$clienthost fanny

The argument to $logevent is a mask that determines what is logged. A value of 0X0ff will log all events excluding debug messages, while a value of 0X1ff will log all events including debug messages. You'll need this file on every machine. There are a number of other options, such as creating an access list.

Finally, you'll want to create a default_server file in the working directory with the fully qualified domain name of the machine running the server daemon.

PBS uses ports 15001-15004 by default, so it is essential that your firewall doesn't block these ports. These can be changed by editing the /etc/services file. A full list of services and ports can be found in the Administrator Guide (along with other configuration options). If you decide to change ports, it is essential that you do this consistently across your cluster!

Once you have the configuration files in place, the next step is to start the appropriate daemons, which must be started as root. The first time through, you'll want to start these manually. Once you are convinced that everything is working the way you want, configure the daemons to start automatically when the systems boot by adding them to the appropriate startup file, such as /etc/rc.d/rc.local. All three daemons must be started on the server, but the pbs_mom is the only daemon needed on the compute nodes. It is best to start pbs_mom before you start the pbs_server so that it can respond to the server's polling.

Typically, no options are needed for pbs_mom. The first time (and only the first time) you run pbs_server, start it with the option -t create.

[root@fanny OpenPBS]# pbs_server -t create

This option is used to create a new server database. Unlike pbs_mom and pbs_sched, pbs_server can be configured dynamically after it has been started.

The options to pbs_sched will depend on your site's scheduling policies. For the default FIFO scheduler, no options are required. For a more detailed discussion of command-line options, see the manpages for each daemon.

11.1.4 Managing PBS

We'll begin by looking at the command-line utilities first since the GUI may not always be available. Once you have mastered these commands, using the GUI should be straightforward. From a manager's perspective, the first command you'll want to become familiar with is qmgr, the queue management command. qmgr is used to create job queues and manage their properties. It is also used to manage nodes and servers providing an interface to the batch system. In this section we'll look at a few basic examples rather than try to be exhaustive.

First, identify the pbs_server managers, i.e., the users who are allowed to reconfigure the batch system. This is generally a one-time task. (Keep in mind that not all commands require administrative privileges. Subcommands such as the list and print can be executed by all users.) Run the qmgr command as follows, substituting your username:

[root@fanny OpenPBS]# qmgr

Max open servers: 4

Qmgr: set server managers=sloanjd@fanny.wofford.int

Qmgr: quit

You can specify multiple managers by adding their names to the end of the command, separated by commas. Once done, you'll no longer need root privileges to manage PBS.

Your next task will be to create a queue. Let's look at an example.

[sloanjd@fanny PBS]$ qmgr

Max open servers: 4

Qmgr: create queue workqueue

Qmgr: set queue workqueue queue_type = execution

Qmgr: set queue workqueue resources_max.cput = 24:00:00

Qmgr: set queue workqueue resources_min.cput = 00:00:01

Qmgr: set queue workqueue enabled = true

Qmgr: set queue workqueue started = true

Qmgr: set server scheduling = true

Qmgr: set server default_queue = workqueue

Qmgr: quit

In this example we have created a new queue named workqueue. We have limited CPU time to between 1 second and 24 hours. The queue has been enabled, started, and set as the default queue for the server, which must have at least one queue defined. All queues must have a type, be enabled, and be started.

As you can see from the example, the general form of a qmgr command line is a command (active, create, delete, set, unset, list, or print) followed by a target (server, queue, or node) followed by an attribute assignment. These keywords can be abbreviated as long as there is no ambiguity. In the first example in this section, we set a server attribute. In the second example, the target was the queue that we were creating for most of the commands.

To examine the configuration of the server, use the command

Qmgr: print server

This can be used to save the configuration you are using. Use the command

[root@fanny PBS]# qmgr -c "print server" > server.config

Note, that with the -c flag, qmgr commands can be entered on a single line. To re-create the queue at a later time, use the command

[root@fanny PBS]# qmgr < server.config

This can save a lot of typing or can be automated if needed. Other actions are described in the documentation.

Another useful command is pbsnodes, which lists the status of the nodes on your cluster.

[sloanjd@amy sloanjd]$ pbsnodes -a

oscarnode1.oscardomain

     state = free

     np = 1

     properties = all

     ntype = cluster

   

oscarnode2.oscardomain

     state = free

     np = 1

     properties = all

     ntype = cluster

...

On a large cluster, that can create a lot of output.

11.1.5 Using PBS

From the user's perspective, the place to start is the qsub command, which submits jobs. The only jobs that the qsub accepts are scripts, so you'll need to package your tasks appropriately. Here is a simple example script:

#!/bin/sh

#PBS -N demo

#PBS -o demo.txt

#PBS -e demo.txt

#PBS -q workq

#PBS -l mem=100mb

   

mpiexec -machinefile /etc/myhosts -np 4 /home/sloanjd/area/area

The first line specified the shell to use in interpreting the script, while the next few lines starting with #PBS are directives that are passed to PBS. The first names the job, the next two specify where output and error output go, the next to last identifies the queue that is used, and the last lists a resource that will be needed, in this case 100 MB of memory. The blank line signals the end of PBS directives. Lines that follow the blank line indicate the actual job.

Once you have created the batch script for your job, the qsub command is used to submit the job.

[sloanjd@amy area]$ qsub pbsdemo.sh

11.amy

When run, qsub returns the job identifier as shown. A number of different options are available, both as command-line arguments to qsub or as directives that can be included in the script. See the qsub (1B) manpage for more details.

There are several things you should be aware of when using qsub. First, as noted, it expects a script. Next, the target script cannot take any command-line arguments. Finally, the job is launched on one node. The script must ensure that any parallel processes are then launched on other nodes as needed.

In addition to qsub, there are a number of other useful commands available to the general user. The commands qstat and qdel can be used to manage jobs. In this example, qstat is used to determine what is on the queue:

[sloanjd@amy area]$ qstat

Job id           Name             User             Time Use S Queue

---------------- ---------------- ---------------- -------- - -----

11.amy           pbsdemo          sloanjd                 0 Q workq           

12.amy           pbsdemo          sloanjd                 0 Q workq

qdel is used to delete jobs as shown.

[sloanjd@amy area]$ qdel 11.amy

[sloanjd@amy area]$ qstat

Job id           Name             User             Time Use S Queue

---------------- ---------------- ---------------- -------- - -----

12.amy           pbsdemo          sloanjd                 0 Q workq

qstat can be called with the job identifier to get more information about a particular job or with the -s option to get more details.

A few of the more useful ones include the following:

qalter: This is used to modify the attributes of an existing job.
qhold: This is used to place a hold on a job.
qmove: This is used to move a job from one queue to another.
qorder: This is used to change the order of two jobs.
qrun: This is used to force a server to start a job.

If you start with the qsub (1B) manpage, other available commands are listed in the "See Also" section.

Figure 11-1. xpbs -admin

Figure 11-2. xpbsmon

11.1.6 PBS's GUI

PBS provides two GUIs for queue management. The command xpbs will start a general interface. If you need to do administrative tasks, you should include the argument -admin. Figure 11-1 shows the xpbs GUI with the -admin option. Without this option, the general appearance is the same, but a number of buttons are missing. You can terminate a server; start, stop, enable, or disable a queue; or run or rerun a job. To monitor nodes in your cluster, you can use the xpbsmon command, shown for a few machines in Figure 11-2.

11.1.7 Maui Scheduler

If you need to go beyond the schedulers supplied with PBS, you should consider installing Maui. In a sense, Maui picks up where PBS leaves off. It is an external scheduler-that is, it does not include a resource manager. Rather, it can be used in conjunction with a resource manager such as PBS to extend the resource manager's capabilities. In addition to PBS, Maui works with a number of other resource managers.

Maui controls how, when, and where jobs will be run and can be described as a policy engine. When used correctly, it can provide extremely high system utilization and should be considered for any large or heavily utilized cluster that needs to optimize throughput. Maui provides a number of very advanced scheduling options. Administration is through the master configuration file maui.cfg and through either a text-based or a web-based interface.

Maui is installed by default as part of OSCAR and Rocks. For the most recent version of Maui or for further documentation, you should visit the Maui web site, http://www.supercluster.org.

Table of Contents