UMBC High Performance Computing Facility
Please note that this page is under construction. We are documenting the
240-node cluster maya that will be available after Summer 2014.
Currently, the 84-node cluster tara still operates independently,
until it becomes part of maya at the end of Summer 2014.
Please see the 2013 Resources Pages under the Resources tab for tara information.
This facility is a shared resource for research at UMBC that requires a high-performance, particularly a parallel computer.
The following policies intend to help make this facility effective for users and to ensure the maintenance of the facility.
For the long-term benefit of everybody, it is vital that all users comply with all aspects of the policies.
These policies are subject to active development at this time, in response to issues that come to our attention and in response
to usage patterns. This webpage always shows the current usage policies in effect.
There are several aspects to usage policies on a large computers that is shared by many users and additional aspects for a facility
that relies on active support from its users for its maintenance. Therefore, the following items are grouped by their purpose.
If you have any questions or concerns, do not hesitate to contact the chair of the user committee; see the contact information.
Good User Behaviors
On a day-to-day basis, it is imperative that users run their code in a responsible fashion, so as not to hinder or damage other
users' code. To this end, the following rules must be followed at all times. To comply with many of these common sense rules might
require you to understand something about parallel computing or about the setup of the hardware and software of the machine. Do not
hesitate to contact the chair of the user committee as point-of-contact to ask questions or to report potential problems.
All users must use the batch submission system of the scheduler that is running on the user node to reserve compute nodes for
their use. You are not allowed to log in to the compute nodes for the purpose of running job directly there. For certain purposes,
interactive use of a node may be necessary; if you need this, please contact the chair of the user committee. Only users who need
this and who have received appropriate training are allowed interactive use of any compute node.
Users will be notified by e-mail about issues related to the system, such as scheduled downtime, upgrades, etc. Such mail may
also include requests for information and feedback. Users are required to monitor the e-mail address on file with the chair of the
user committee and are required to respond to contacts. This is part of the active communication necessary for a shared resource such
as this to be used effectively by all users. Currently, the time slot for scheduled downtime is every Tuesday evening. This downtime
window may not be used every week, but users should plan their work accordingly. An effort will be made to send out an announcement
to inform of an actually scheduled downtime, but users should not rely on such notices.
IMPORTANT NOTE: If users are observed to violate any of the above rules or are behaving in any way that impacts other users' ability to
use the resource, the chair of the user committee has the right to terminate the users jobs and/or to suspend the user's account.
Ordinarily, we will try to make contact with the user first to discuss what is going on and to try to work with the user, but if other
users are impacted, the account can be suspended first. Decisions by the chair of the user committee are subject to review by the user
committee; see the contact information for a list of the members of the user committee.
Access to the Facility
This facility is a shared resource for research at UMBC that requires a high-performance parallel computer. To get an account to this
facility, please submit an account request form completely filled out. To maintain access, users must follow all policies outlined
in the following at all times. To ensure the success of this facility in the long run, it is vital that there be demonstrated research
results created on this machine, hence the initial users should have an on-going program of high-performance computing research. As
usage patterns develop, the machine is fully set up and grows in number of nodes, we anticipate that the number of accounts can be
increased. Do not hesitate to contact the chair of the user committee at any time to get an understanding of the current usage levels
of the machine and whether space is available or not.
Users are invited to contribute direct funding to the facility at $5,000 per node or a multiple thereof. Contributions from faculty in
this way will be bundled and used for a hardware purchase ordinarily once a year. Contributing this money gives these users priority
access over other users to that number of nodes, in the sense explained in the following.
The access to compute nodes for users will be managed by a job scheduling software, called scheduler, that reserves compute nodes for
users. The scheduler reserves compute nodes based on the availability of resources in combination with a user's priority. The following
principles will guide the setup of the scheduler:
The scheduler schedules jobs in a first-come-first-served basis among users with equal priority, assuming availability of requested
resources (number of nodes requested, nodes with certain features requested, etc.). Users who contribute funding to the cluster enjoy
an increased priority for scheduling their jobs, up to the node-hours as explained below.
Users' jobs are generally limited to no more than 23 node-hours. This means that the product of the number of nodes times the
number of hours to run the job is limited to be 23. Additionally, if current usage patterns on the machine allow for it, we are happy
to let users run longer or larger jobs by arrangement; contact the chair of the user committee. Example 1: This means that a serial
job (1 node used) must complete within 23 hours. Example 2: If 23 nodes are used for the job, the length of the job will be limited
to 1 hour.
That guiding principle is that users who contribute funding to the cluster have the right to use their number of nodes for 23
hours per day without a time limitation. The remaining hour of each day is reserved for running jobs that require larger number of
nodes than available otherwise, if there are such requests in the queue. The scheduler will be set up to pause all jobs and restart
them within an hour or after all requests for large numbers of nodes are satified, whichever comes first.
In practice, the right to use their number of nodes for 23 hours per day for users who contribute funding to the cluster is
implemented by giving them an allotment of node-hours each month that is equal to their number of nodes multiplied by 23 times the
number of days in a month (using for simplicity 30 days for all months). This means that such users can either request their number
of nodes for 23 hours on all days of the month, or can choose to request more nodes than theirs and use them for the time available
until the allotment runs out. Naturally, users can continue to submit jobs beyond the node-hours allotted. The point is that these
users enjoy a priority in scheduling up to the point of using up their node-hours in each month, after which they do not enjoy a priority.
The above rules do not apply to system administration and testing of the machine, including select users running jobs for the
purpose of testing, debugging, or benchmarking the system. For instance, users with existing code may be specifically running large
jobs to test the new system; that is, it is not just the actual system administrator running such jobs. Such efforts will be coordinated
by the chair of the user committee in collaboration with OIT and the user committee. We anticipate that such activity is limited
to the initial phase of the machine or after significant changes in, e.g., hardware or software.
It is noted explicitly that the above usage policies are subject to modification, if it turns out that it is impossible to implement
certain features in the scheduling software. In any case, it is hoped that the number of users will be reasonably small initially
and be in frequent communication with each other to coordinate the scheduling of jobs in a cooperative way, until usage patters
become clear and the setup of the scheduler is improved. This is stated in the spirit that setting up the scheduler is in fact not
the initial priority for the administration of the cluster; rather, proper testing, debugging, benchmarking, optimizing performance
of the hardware and software, and initial results are more important priorities for the first months of the machine.
Obligation of All Users to Help Maintain the Facility
This machine has been created by financial and ideal support both from faculty and from UMBC. To ensure the long-term existence of
this facility, all users have an obligation to help actively to sustain it. This obligation has financial and scientific (non-financial)
aspects, and support for both aspects is required from all users to maintain their accounts on the systems. The requirements includes
the following methods of support:
Each user must provide a title and abstract for all research projects conducted on the facility's machines. Different projects
should have each their own information. This information will be posted on the facility's webpage to demonstrate the uses.
Each user is required to provide information on outcomes of the research conducted on the facility's machines. This includes both
information on papers submitted and published and on presentations given. We are happy to post PDF files of papers or presentations
on the facility's webpage or point a link to another webpage.
Each user must acknowledge the use of this facility, for instance, in papers and presentations. Proper acknowledgement may use
the following sentence: "The computational resources used for this work were provided by the UMBC High Performance Computing Facility
at the University of Maryland, Baltimore County (UMBC); see www.umbc.edu/hpcf for information on the facility and its uses."
Each user (or the sponsoring PI, if the user is not a faculty member) must be willing to participate as co-PI or co-investigator
in future grant proposals. This implies a willingness to supply short descriptions of the research and its results and to provide the
necessary information for grant proposals (bio sketch, current/pending support, and similar), when requested.
Each user is required to include budget requests for computational resources in individual grant proposals. The support requested
should be commensurate with the amount of resource typically used; the cost per node for contributing users above is a guide for the
cost. To support such efforts, we are ready to help with your proposal, including drafting text, acting as co-PI/co-investigator,
supplying a support letter, or whatever way is suitable. Contact the chair of the user committee early enough before your proposal
due date to work out details.
All users including principal investigators must confirm when requested that they and their research group still require the
account on the facility's machines. Specifically, at the beginning of every Fall semester, all accounts will be reviewed to determine
if they should be continued. The purpose is to avoid large numbers of inactive accounts. This facility is not suitable for long-term
data storage; users are required to move their data off the machine at the completion of projects. An account cannot be kept open
solely for the purpose of access to data on the machine.
Users who wish to continue their account on the system are required to supply proofs of outcomes of the usage of the machine,
including for instance publications, presentations, preprints, grant proposals including funding requests for nodes on the machine.
Users are required to submit such proofs continuously throughout the year, but also specifically at the time of account review at
the beginning of the Fall semester. If no information is received upon request or there was no effort to help maintain the facility,
the user's account including all accounts sponsored by the faculty member will be suspended and/or their priority of usage reduced.
To help with the documentation of research results, we provide at part of this webpage a preprint server where technical reports of
results can be posted as well as webpages for each project, where publications and presentations of the research can be posted
throughout the year.
The philosophy adopted here is one of granting an account on this facility first and then requiring help in maintaining it, as opposed
to requiring up-front payment to use the facility. This approach allows researchers to start using the facility immediately at any
point in the year and to obtain initial research results using it. In turn and using these results, it is then necessary for users to
actively demonstrate results as well as to search for funding to sustain the facility.