Batch processing is usually the farthest thing from a Unix user's mind. People are usually drawn to Unix so they can avoid the old batch environments where users would queue up large processing jobs and come back two days later to see what had happened. Particularly in the workstation world, systems are often single-user machines on which batching may not make a lot of sense at first glance. However, batch processing is becoming more important in the Unix world than it has been in the past.
In the current frenzy to downsize from mainframes to open systems, users often want to regain some of the mainframe features they left behind, including batch processing. Also, the advent of inexpensive high-powered Unix machines makes them good candidates for doing batch processing, particularly if the batching can be distributed across a number of nodes in a network. We'll look at some examples of batch processing on Unix systems, concentrating on examples for a freely available Unix batch system.
People with a stock Unix system often start up a batch job by
using the at
or
batch
commands. These commands allow a user to
run a batch job either immediately or in the future. The job
will run, and the results will be mailed back to the user. This
method is less than satisfactory for several reasons. For one
thing, it is often useful to queue up a whole bunch of jobs and
have them run consecutively automatically. This way no
individual job will overload the machine. Usually six jobs
running sequentially will finish faster than six jobs running at
the same time.
In modern environments where there may be many workstations in a workgroup, it would be useful to queue the job up to the workstation that is least busy--after all, if your office mate has gone out for a long lunch, it would be a shame to have that new zippy workstation just sit idle. Modern batching software will automatically dole out jobs to the machines that are least busy.
A true batching system also gives finer control over the batch jobs than do the traditional Unix tools. Using a batch control system, users can assign priorities to jobs, suspend them, kill them, or control them in other ways. In some cases, the software may even make a group of systems appear to be one powerful virtual machine by taking a job and splitting it up among the systems in the cluster.
Thus, the goals of a Unix batching system include some or all of the following:
In this tutorial I will use the Distributed Queuing System (DQS) as an example of a batching system. DQS is publicly available and was developed at the Supercomputer Computations Research Institute at Florida State University by Tom Green and Jeff Snyder. It is a fairly simple batching system, but it is good to use as an example. And it compiles easily on many platforms, including IBM's (Armonk, N.Y.) AIX, Silicon Graphics Inc.'s (Mountain View, Calif.) Irix, Sun's (Mountain View, Calif.) SunOS, Convex Computer Corp.'s (Richardson, Texas) ConvexOS, and Cray Research Inc.'s (Eagan, Minn.) Unicos. It also supports batching in AFS, a distributed file system, authenticating the jobs at run time.
DQS runs on one system in the cluster as a ``qmaster'' machine. It keeps track of the different queues on the various systems and starts the jobs on remote machines. Typically each of the worker systems will have one queue (or possibly more) to which jobs are submitted. Within an individual queue, one job will be executed at a time, and the other jobs will wait for resources to become available. Normally each system will have one queue, which will force serialization of the jobs. However, a system that handles multiple jobs well--for example, a multiprocessor system--might be given more than one queue so that multiple jobs will execute simultaneously.
The system may be built so that jobs will be distributed among queues more or less sequentially or so that any incoming jobs will be assigned to the node that has the lowest load currently. This system generally assumes that nodes may be used for something other than batch processing--that is, that you may be dealing with nodes that are sitting on someone's desk, and the batching system is being used to ``soak up'' those excess CPU cycles that may be going to waste.
Normally jobs are not submitted to an individual queue but to
a group, which will usually be a number of queues distributed
across machines of the same architecture. Jobs are submitted
with the qsub
command, which specifies the name of
the command file and the queue the job is being submitted to.
The qsub
command files are fairly normal Bourne
shell scripts, with one exception. They have Bourne shell
comments--those starting with ``#$''--that control the DQS
system. The arguments after the ``#$'' are actually options that
are fed to the qsub
command and could just as well
have been typed on the command line.
Listing 1 shows a sample command
script for a job to be submitted to a batch group called ``sun'',
a cluster of Sun workstations. This batch job recompiles the DQS
batching system. It sends electronic mail at the beginning and
end of the job and dumps the standard output and standard error
output into a file named batchjob.out
.
Listing 2A shows how to submit
the batch script depicted in Listing 1. DQS also includes
powerful commands for examining and manipulating queues. Once a
job has been submitted with qsub
, it may be examined
with the qstat
command, which recognizes several
options. Listing 2B shows how the
-l
qstat
to give more detailed information about the
queue status. Note that two jobs were submitted to the ``sun''
group, and they were distributed automatically across both of the
machines in the group.
The qmod
command may be used by the owner of a
queue--for example, the workstation owner--or a system
administrator to suspend and restart a queue. Listing 2C demonstrates these
functions.
Additionally, there is a qdel
command, which may
be used to delete a DQS job from a queue. It can only be used by
the owner of the job, a system operator, or a DQS manager. So,
to delete a job, a user would usually first run
qstat
to determine the request ID of the job he or
she wants to delete, then use qdel
to terminate the
running job (Listing 2D).
The last command demonstrated in the listing is
qacct
. There is an accounting system built into DQS
that tracks the system usage by job and by user. This facility
makes it easy to track what your systems are being used for. Listing 2E shows qacct -u
displaying the accounting history by user. The
qacct -j jobname
The qmon
program allows the user to do most of
the features of all of the commands above while working in an X
Window System graphical-user-interface environment. This tool
uses Motif widgets and is nicely put together though the current
version of qmon
doesn't support all of the DQS
features and is therefore currently lagging a little behind the
rest of the DQS system.
There is also a qidle
program, which can be run
on a system that has a graphics display running X. This program
monitors the X display and will automatically suspend the the
queues on the machine when the X display is being used
interactively. This method will keep the batch jobs from slowing
down the responsiveness of X on the workstation but will allow
the excess CPU cycles to be used when the workstation is
idle.
DQS includes a program called qsh
, which allows
users to submit commands to a queue interactively. It is
somewhat analogous to the difference between using the shell
interactively and executing a shell script.
A queue has several options that may be configured in DQS.
These options will modify the behavior of the queue and limit the
resources that a job may use. The queues are configured with the
qconf
command, which is also used to add and delete
hosts and users from the system configuration. Some of the
options to qconf
are shown in Table 1. In general, the DQS stategy is
to define access to the cluster at two levels: access to the
cluster as a whole and access to individual queues in the
cluster. Each cluster or queue may have access set to one of the
following levels:
This access permission system is fairly flexible and allows policy to be set regarding which computing resources are available to which users.
In addition to configuring queues, hosts, and users, the
editing features of qconf
allow the administrator to
change the characteristics of the system. Several features of
the system may be configured either when the system is built or
changed dynamically by using qconf
. These features
include the default shell, minimum user ID and group ID values
that are allowed to use the batching system, the number of jobs
that an individual user can have running at once, whether jobs
are re-runnable, and so forth. Several features of queues may
also be configured at run time or by qconf
. Queues
may be defined as batch, interactive, or either. Only batches
that are defined as interactive may be used with the
qsh
command. Overload levels may be defined for a
queue: If the load on the queue node exceeds the overload level,
the icon for the queue will turn red in the qmon
program display.
As was mentioned previously, systems may be configured to assign jobs according to the system's load average. This feature may also be tinkered with somewhat by assigning load multipliers for each queue, so that some queues are more ``expensive'' than others. Therefore, these queues will be less likely to run jobs. This capability is a way of changing the priorities of different queues.
Queues may also be assigned ``nice'' values. Adding niceness to a process reduces its priority, so other processes will be more likely to be scheduled first. Therefore, a system may be given two different queues, one of which nices jobs down quite a lot (useful for long-running jobs) and another that doesn't nice down jobs (intended for quick, low-resource jobs).
So how do you keep your users from cheating? We all know computer users--if there's a high-speed queue and a low-speed queue, nobody will use the low-speed queue. After all, their thinking goes, my job is more important than all those other ones. Each queue may also be assigned limits on the amount of CPU time, core file size, total memory usage, and disk output each job may use. This way, you can set the limits low on queues that are intended to be express queues and set the limits higher on queues that are niced.
There are several other systems for batching Unix jobs, some commercially available, others free. Note that some of these are simple batching systems or load-leveling systems, while others are more ambitious and aim to divide an individual job in parallel across a number of different systems, which may be the same architecture or different architectures. NQS is somewhat similar in flavor to DQS; it also supports the notion of using the batch queues for other functions, such as printing queues. Condor is a somewhat newer package that was written at the University of Wisconsin and is freely available in source form.
IBM Corporation has taken the Condor package and added features to it, including usability in an AFS environment and some more powerful batch job and queuing control mechanisms, and called the package Load Leveler. It is available commercially from IBM for RS/6000, Sun, and Silicon Graphics platforms. Load Leveler is pretty nicely implemented and is one of the packages we've been using at Fermilab for production batching. It has somewhat more sophisticated queue controls than DQS, although DQS has more features in the works.
Finally, Parallel Virtual Machine (PVM) is a more extended package that allows an application developer to treat an extended group of systems as if they were one very large virtual machine. Unlike the other packages in this article, PVM requires source modification of the C or Fortran program to call the PVM libraries. It is particularly useful for classes of programs that are easily split up into many parallel jobs, such as certain kinds of data analysis. It may also be used as a back end for the DQS package so that users may submit jobs through the normal DQS mechanism but have them run on many machines in parallel.
Which package should you use? A lot will depend on a site's feelings about publicly available software. The publicly available software is somewhat more of an adventure but brings with it the advantage of having source code available, either to fix bugs or add features. The commercial packages have the advantage of providing vendor support, although it's often the case that the absence of support for publicly available packages is better than the formal support of some commercial packages. Take a look at some of the public packages, gather up some of the information on the commercial packages, and decide what suits your needs best. And happy batching!
NQS - NETWORK QUEUING SYSTEM, VERSION 2.0The Network Queuing System, NQS, is a versatile batch and device queuing facility for a single Unix computer or a group of networked computers. With the Unix operating system as a common interface, the user can invoke the NQS collection of user-space programs to move batch and device jobs freely around the different computer hardware tied into the network. NQS provides facilities for remote queuing, request routing, remote status, queue status controls, batch request resource quota limits, and remote output return.
NQS is written in C language and has been successfully implemented on a variety of UNIX platforms, including Sun3 and Sun4 series computers, SGI IRIS computers running IRIX 3.3, DEC computers running ULTRIX 4.1, AMDAHL computers running UTS 1.3
It requires quite a lot of work to get going. More recent releases are sold by Sterling Software (Palo Alto, Calif.) and several different computer manufacturers.
send index from pvm3??
to receive a listing of the files on PVM version 3 that may be obtained from the server.