Submitting a Job¶
Once you have a job submission file, you may submit this script to SLURM using the sbatch command. SLURM will find, or wait for, available resources matching your request and run your job there.
On Bell, in order to submit jobs, you need to specify the partition, account and Quality of Service (QoS) name to which you want to submit your jobs. To familiarize yourself with the partitions and QoS available on Bell, visit Bell Queues and Partitions. To check the available partitions on Bell, you can use the showpartitions , and to check your available accounts you can use slist commands. Slurm uses the term "Account" with the option -A or --account= to specify different batch accounts, the option -p or --partition= to select a specific partition for job submission, and the option -q or --qos= .
CPU Partition¶
The CPU partition on Bell has two Quality of Service (QoS) levels: normal and standby. To submit your job to one compute node on cpu partition and 'normal' QoS which has "high priority":
To submit your job to one compute node on cpu partition and 'standby' QoS which is has "low priority":
GPU Partition¶
On the GPU partition on Bell you don’t need to specify the QoS name because only one QoS exists for this partition, and the default is normal. To submit your job to one compute node requesting one GPU on the gpu partition under the 'normal' QoS which has "high priority":
Highmem Partition¶
To submit your job to a compute node in the highmem partition, you don’t need to specify the QoS name because only one QoS exists for this partition, and the default is normal. However, the highmem partition is only suitable for jobs with memory requirements that exceed the capacity of a standard node, so the number of requested tasks should be appropriately high.
General Information¶
By default, each job receives 30 minutes of wall time, or clock time. If you know that your job will not need more than a certain amount of time to run, request less than the maximum wall time, as this may allow your job to run sooner. To request 1 hour and 30 minutes of wall time:
The --nodes= or -N value indicates how many compute nodes you would like for your job, and --ntasks= or -n value indicates the number of tasks you want to run.
In some cases, you may want to request multiple nodes. To utilize multiple nodes, you will need to have a program or code that is specifically programmed to use multiple nodes such as with MPI. Simply requesting more nodes will not make your work go faster. Your code must support this ability.
To request 2 compute nodes:
By default, jobs on Bell will share nodes with other jobs.
If more convenient, you may also specify any command line options to sbatch from within your job submission file, using a special form of comment:
If an option is present in both your job submission file and on the command line, the option on the command line will take precedence.
After you submit your job with SBATCH, it may wait in queue for minutes, hours, or even weeks. How long it takes for a job to start depends on the specific queue, the resources and time requested, and other jobs already waiting in that queue requested as well. It is impossible to say for sure when any given job will start. For best results, request no more resources than your job requires.
Once your job is submitted, you can monitor the job status, wait for the job to complete, and check the job output.
Back to the Running Jobs section