Skip to content

Slurm

Checkpointing in a Preemptible Environment

What is Preemption?

In the context of computing, preemption refers to the act of stopping or pausing one process to allow another process to run, and we say that a task, X, preempts another task, Y, when X pauses Y to allow itself to run. We can use this concept of preemption in an HPC environment to maximize the utilization of the resources in the cluster by allowing low priority jobs to be preempted by higher priority jobs since this allows us to be lenient with the resource limitations placed on the low priority jobs.

About Slurm Fairshare on RCAC clusters

The purpose of this article is to provide a deep-dive into how Slurm assigns priority to the jobs it is scheduling. The design space for such a scheduler is very large and so there are many options that Slurm provides to accommodate a variety of different clusters/policies.