| Partition | Description | Walltime limits |
|---|---|---|
| develop | There are two nodes in the develop partition, n1 and n2. This partition is dedicated to code under development. Jobs of up to 16 cores may be tested, but run time is supposed to be negligible. | 5 min default, 30 min max |
| batch | The majority of the compute nodes on tara are allocated to this partition. There are 82 nodes: n3, ..., n84. Jobs running on these nodes are considered "production" runs; users should have a high degree of confidence that bugs have been worked out. | --- |
| QOS | Wall time limit per job | CPU time limit per job | Total node limit for the QOS | Node limit per user |
|---|---|---|---|---|
| short | 1 hour | 512 hours | --- | --- |
| normal (default) | 4 hours | 512 hours | --- | --- |
| medium | 24 hours | --- | 30 | --- |
| long | 5 days | --- | 30 | 2 |
| long_contrib | 5 days | --- | 30 | --- |
| Number of nodes | Cores per node | Total number of cores | Wall time (hours) | CPU time (hours) |
|---|---|---|---|---|
| 64 | 8 | 512 | 1 | 512 |
| 32 | 8 | 256 | 2 | 512 |
| 16 | 8 | 128 | 4 | 512 |
| 8 | 8 | 64 | 8 | 512 |
| 4 | 8 | 32 | 16 | 512 |
| 2 | 8 | 16 | 32 | 512 |
| 1 | 8 | 8 | 64 | 512 |
| 1 | 4 | 4 | 128 | 512 |
| 1 | 2 | 2 | 256 | 512 |
| 1 | 1 | 1 | 512 | 512 |
[araim1@tara-fe1 ~]$ sbatch run.slurm sbatch: error: Batch job submission failed: Job has invalid qos [araim1@tara-fe1 ~]$
[araim1@tara-fe1 ~]$ sbatch run.slurm sbatch: error: Batch job submission failed: Job violates accounting policy (job submit limit, user's size and/or time limits) [araim1@tara-fe1 ~]$
[araim1@slurm-dev ~]$ sbatch run.slurm sbatch: error: Batch job submission failed: Job violates accounting policy (job submit limit, user's size and/or time limits) [araim1@tara-fe1 ~]$
[araim1@tara-fe1 ~]$ squeue JOBID PARTITION NAME USER ST TIME NODES QOS NODELIST(REASON) 4278 batch users01 araim1 PD 0:00 30 normal (AssociationResourceLimit) 4277 batch users01 araim1 R 2:54 2 normal n[7-8] [araim1@tara-fe1 ~]$
[araim1@tara-fe1 ~]$ cat slurm.err slurmd[n1]: error: *** JOB 59545 CANCELLED AT 2011-05-20T08:10:52 DUE TO TIME LIMIT *** [araim1@tara-fe1 ~]$
[araim1@tara-fe1 ~]$ cat slurm.err slurmd[n3]: *** JOB 4254 CANCELLED AT 2011-05-27T19:42:14 DUE TO TIME LIMIT *** slurmd[n3]: *** STEP 4254.0 CANCELLED AT 2011-05-27T19:42:14 DUE TO TIME LIMIT *** [araim1@tara-fe1 ~]$
[araim1@tara-fe1 ~]$ sbatch run.slurm sbatch: error: Batch job submission failed: Job violates accounting policy job submit limit, user's size and/or time limits) [araim1@tara-fe1 ~]$
[araim1@tara-fe1 ~]$ sbatch run.slurm sbatch: error: Batch job submission failed: Invalid account specified [araim1@tara-fe1 ~]$
[araim1@tara-fe1 ~]$ sbatch run.slurm sbatch: error: Batch job submission failed: Job has invalid qos [araim1@tara-fe1 ~]$
[araim1@tara-fe1 ~]$ sbatch run.slurm sbatch: error: Batch job submission failed: Invalid partition name specified [araim1@tara-fe1 ~]$
[araim1@tara-fe1 ~]$ sbatch run.slurm sbatch: error: Batch job submission failed: Requested node configuration is not available [araim1@tara-fe1 ~]$
[araim1@tara-fe1 ~]$ sbatch run.slurm sbatch: error: Batch job submission failed: Node count specification invalid [araim1@tara-fe1 ~]$
[araim1@tara-fe1 ~]$ sbatch run.slurm sbatch: unrecognized option `--ndoes=2' sbatch: error: Try "sbatch --help" for more information [araim1@tara-fe1 ~]$
[araim1@tara-fe1 ~]$ cat slurm.err slurmd[n1]: error: Job 60204 exceeded 10240 KB memory limit, being killed slurmd[n1]: error: *** JOB 60204 CANCELLED AT 2011-05-27T19:34:34 *** [araim1@tara-fe1 ~]$
[araim1@tara-fe1 ~]$ sbatch run.slurm sbatch: error: Batch job submission failed: Requested node configuration is not available [araim1@tara-fe1 ~]$
[araim1@tara-fe1 ~]$ squeue JOBID PARTITION NAME USER ST TIME NODES QOS NODELIST(REASON) 62280 develop SNOW araim1 PD 0:00 1 normal (PartitionTimeLimit) [araim1@tara-fe1 ~]$