Resource Allocation
Resource allocation in HPC is the resources given to you which you are allowed to use for your jobs. A resource in this context is defined as CPU or GPU usage per second.
There are several approaches to ensuring fair resource management, which include:
- First In, First Out (FIFO)
- Backfill, or smallest job first
- Fairshare, which dynamically adjusts priority based on users' recent usage
Slurm
Monsoon uses SLURM (Simple Linux Utility for Resource Management) for resource allocation.
Fairshare
Fairshare is a value between 0 and 1 that is calculated based on a user's recent resource usage. The higher the fairshare value, the higher the priority for scheduling jobs. The more resources a user has used recently, the lower their fairshare value will be. This means that users who have used fewer resources recently will have a higher priority for their jobs to be scheduled.
Details on the fairshare algorithm can be found in the SLURM documentation.
LevelFS
LevelFS is a value that represents the level of fairness in resource allocation. The ranges of LevelFS levels and their meanings are:
- LevelFS < 1: High priority for scheduling jobs
- LevelFS 1-10: Moderate priority for scheduling jobs
- LevelFS > 10: Low priority for scheduling jobs
Checking your Fairshare and LevelFS
The sshare command can be used to check your fairshare and LevelFS values.
This command provides information about the resource usage and priority of users in the SLURM system.
$ sshare -Po user,fairshare,levelfs | grep $USER
abc123|0.456209|4.588063
In this example, abc123 has a fairshare value of 0.456209 and a LevelFS value of 4.588063.
A fairshare value of 0.456209 means a moderate amount of resources were used recently.
A LevelFS value of 4.588063 indicates a moderate priority for scheduling jobs.