UMBC High Performance Computing Facility : HPC Serial Job Submission
This page last changed on Mar 11, 2009 by straha1.
You cannot execute jobs directly on the cluster nodes yourself; you must have the cluster's batch system execute the programs for you. The cluster's batch system requires that you write a script and submit that script using the qsub command. The script should contain the commands that you need to have executed on the cluster nodes. There are three different queues to which you can submit your jobs. One queue is the testing queue, which is intended for short-lived test jobs for debugging. Please submit your jobs to the testing queue until you are sure that your code is working. Once it is working, you can use the low_priority or high_priority queues to run your job on more machines for longer periods of time. See this page if you want explanations of the differences between the three queues.
Running serial (non-MPI) jobs on HPC is not much different than running parallel jobs. There are only two major differences. The first is that you replace the mpirun command with whatever serial job you want to run. The second is in your qsub script's PBS -l nodes=... line. As an example, let's create the following qsub script test.qsub which will run the serial program echo. This script will use the testing queue, intended for short-lived, single-machine test jobs for debugging purposes. We will use the low_priority queue later:
Note that we are using nodes=1:ppn=1 – that requests a single processor core. As before, the #! /bin/bash line tells Linux to use the shell /bin/bash to run your script. Lines beginning with a ":" are comments, and the #PBS lines are used by PBS to decide what resources to give your job. Look at this page for more information those lines: HPC Parallel Job Submission. The cd $PBS_O_WORKDIR line changes to the directory in which you ran qsub. The echo HELLO WORLD command will run echo with HELLO WORLD as its arguments. The program echo is built in to bash and simply prints its arguments.
If you run:
then eventually, your job will run and create qsub.out and qsub.err files. (You can use qstat to monitor your job while it is queued or running, or qdel to delete misconfigured or accidental jobs.) If everything went according to plan, the qsub.err file should be empty and the qsub.out job should contain "HELLO WORLD".
Once you have finished debugging your hello world program, you can resubmit it into the low_priority queue using this script:
If you have access to the high_priority queue, you can use that queue instead by replacing PBS -q low_priority with PBS -q high_priority. Note that we are using nodes=1:ppn=1 now – that requests that you get a single processor core somewhere on the cluster. If your job does not require large amounts of memory or multiple processors, please use nodes=1:ppn=1. If you need multiple processor cores, fast Infiniband I/O or more than three gigabytes of RAM, you should use nodes=1:ppn=4 instead to ensure that you get one machine all to yourself.
While printing simple text messages to an output file is enough for some people, you might need to do more. You can, of course, replace that "echo" line with a line that executes your own serial program. If you want to use IDL, Matlab, R or SAS, you have to do a few extra things:
|Document generated by Confluence on Mar 31, 2011 15:37|