Usage

To use ViewClust in a project:

import viewclust
  • All following examples are using vc as an alias:

    import viewclust as vc
    
  • Functions can then be called with vc:

    vc.job_use(jobs, d_from, target)
    

To use the ViewClust/Slurm sub-module:

from viewclust import slurm
  • Functions can then be called with slurm:

    slurm.sacct_jobs(account, d_from, d_to=d_to)
    

ViewClust has the following collection of functions:

Investigating Core-year Usage

The following pattern can be used to process jobs dataframes and get instantaneous core-year usage:

import viewclust as vc
from viewclust import slurm

# Query parameters
account = 'def-tk11br_cpu'
d_from = '2021-02-14T00:00:00'
d_to = '2021-03-16T00:00:00'

# DataFrame query
jobs_df = slurm.sacct_jobs(account, d_from, d_to=d_to)

# ViewClust processing
target = 50
clust_target, queued, running, dist = vc.job_use(jobs_df, d_from, target, d_to=d_to, use_unit='cpu-eqv')

Things to note about this example:

  • The functions use optional arguments. Docstrings are supported in all cases.
  • To generate the input jobs dataframe, we are using an included slurm helper function.
    • The main job query from Slurm can take a long time if your query period is large!
  • Input frames can be anything as long as they are dataframes with the expected columns:
    • submit (datetime64[ns]): submit time
    • start (datetime64[ns]): start time
    • end (datetime64[ns]): end time
    • state (String or object): job state
    • timelimit (timedelta64[ns]): reserved run time
    • reqcpus (int64): total number of CPU cores
    • mem (int64): total reserved memory
    • reqtres (String or object): all requested resources, including GPUs
  • A plot can be generated with ViewClust-Vis

Investigating Memory Usage

In an effort to help with tickets, the follow pattern can be used to investigate memeory efficiency:

import viewclust as vc
import pandas as pd
from viewclust import slurm

# Define some accounts to look at
accounts = ['def-tk11br_cpu','def-jdesjard_cpu']

# Pick a time to start looking from
d_from = '2020-01-10T00:00:00'

# Loop over your queries if you want
for account in accounts:
    print('Working on: ', account) # Just some printing to make sure things are running
    # Queries slurm and uses viewclust to build memory usage plots via
    # MaxRSS field inside of sacct.
    slurm.mem_info(d_from, account, fig_out=account+'.html')
    # Saves output as an html figure!

Things to note about this example:

  • The functions use optional arguments. Docstrings are supported in all cases.
  • To generate the input job frame, we are using an included slurm helper function
  • Data is based on MaxRSS from slurm - which isn’t always a clear indicator