Coordinating Accounts with Slurm

NAU’s Monsoon cluster is host to several research groups. In order to balance the demands of these groups, Slurm is utilized to schedule jobs in a way to maximise fairness. Slurm provides a useful overlay to make starting large compute jobs easy.

Managing Jobs

Listing jobs

squeue -A professor # List by account
squeue -u abc123    # List by user

Cancelling Jobs

scancel 12345678                   # Cancel by job ID
scancel -u abc123                  # Cancel all of a user's jobs
scancel -u abc123 --state=running  # Cancel all of a user's RUNNING jobs
scancel -u abc123 --state=pending  # Cancel all of a user's PENDING jobs
scancel -A professor               # Cancel an entire account's jobs

Holding and Releasing Jobs

$ scontrol hold 12345678      # Hold by job ID
$ scontrol release 12345678   # Release the hold
$ scontrol uhold 12345678     # Hold job 12345678 but allow the job's owner to 
                              # release it

Limiting Users

Check the Current Limits

sacctmgr list assoc account=professor
sacctmgr list assoc user=abc123 format=account,user,grpcpurunmins
sacctmgr list assoc user=abc123

Limiting CPU Time

$ sacctmgr modify user abc123 set GrpCPURunMins=1440  # Limit a user's maximum CPU 
                                                      # time in pending/running 
                                                      # jobs to 1440 minutes 
                                                      # (e.g 24 hours on 1 core, 
                                                      #  12 hours on 2 cores, etc.)

Limiting Usable CPU’s

sacctmgr modify user abc123 set GrpCPUs=2 # The user can only have 2 CPUs 
                                          # allocated at a time

Checking the Current Settings and Status

Adding a Student to a SLURM Account

sacctmgr add user name=abc123 account=professor                       # d user to account
sacctmgr modify user where name=abc123 set defaultaccount=professor   # t user's default account
sacctmgr modify user where name=abc123 set defaultqos=professor       # t user's Quality of Service (QoS)
sacctmgr update user name=abc123 account=professor set fairshare=128  # Set user's fairshare value

Check Account Limits and Fairshare

sacctmgr list assoc account=professor

Show historical Fairshare and Usage Information

sshare -a -l -A professor

Adjusting Priority

Slurm priority values are calculated by taking the sum of a variety of available factors, each an integer value multiplied by a number in the range 0-1.0. Some available factors include:

  • Job size
  • Queue time
  • Fairshare

Calculating Fairshare

Fairshare is calculated with the following equation, taking values from

sshare -laA youraccount

From that data, perform the following calculations:

Norm Shares = Raw Shares / sum(self + siblings' Raw Shares)
Effectv Usage = Raw Usage / account's Raw Usage
FairShare = Norm Shares / Effectv Usage

Modifying User Fairshare

sacctmgr modify user abc123 set fairshare=64