viewclust package

Submodules

viewclust.cumu_plot module

viewclust.cumu_plot.cumu_plot(clust_info, cores_queued, cores_running, resample_str='', fig_out='', query_bounds=True, submit_run=[], user_run=[], plot_queued=False)[source]

Cumulative usage plot.

This function is deprecated as of v0.3.0.

Support continues in the ViewClust-Vis package.

Parameters:
  • clust_info (DataFrame) – Frame which represents the cluster state at given time intervals. See jobUse.
  • cores_queued (array_like of DataFrame) – Series displaying queued resources at a particular time. See jobUse.
  • cores_running (array_like of DataFrame) – Series displaying running resources at a particular time. See jobUse.
  • resample_str (pandas freq str, optional) – Defaults to empty, meaning no resampling. Passing this parameter does not do sanity checking and will only run the below code example. cores_queued = cores_queued.resample(‘1D’).sum()
  • fig_out (str, optional) – Writes the generated figure to file as the given name. If empty, skips writing. Defaults to empty.
  • query_bounds (bool, optional) – Draws red lines on the figure to represent where query is valid. Defaults to true.
  • submit_run (DataFrame, optional) – Draws a red line representing what would usage have looked like if jobs had started instantly. Allows for easier interpretation of the queued series. Defaults to not plotting.
  • plot_queued (bool, optional) – Draw the light blue line indicating the cumulative queued resources.

See also

jobUse()
Generates the input frames for this function.

viewclust.get_users_run module

viewclust.get_users_run.get_users_run(jobs, d_from, target, d_to='', use_unit='cpu', serialize_running='')[source]

Takes a DataFrame full of job information and returns usage for each “user” uniquely based on specified unit.

This function operates as a stepping stone for plotting usage figures and returns various series and frames for several different uses.

Parameters:
  • jobs (DataFrame) – Job DataFrame typically generated by slurm/sacct_jobs or the ccmnt package.
  • use_unit (str, optional) – Usage unit to examine. One of: {‘cpu’, ‘cpu-eqv’, ‘gpu’, ‘gpu-eqv’}. Defaults to ‘cpu’.
  • d_from (date str) – Beginning of the query period, e.g. ‘2019-04-01T00:00:00’.
  • target (int-like) – Typically a cpu allocation or core eqv value for a particular acount. Often 50.
  • d_to (date str, optional) – End of the query period, e.g. ‘2020-01-01T00:00:00’. Defaults to now if empty.
  • serialize_running (str, optional) – Pickle given structure with argument as a name. If left empty, pickle procedure is skipped. Defaults to empty.
Returns:

Frame of running resources for each of the unique “users” in the jobs data frame.

Return type:

user_running_cat

viewclust.insta_plot module

viewclust.insta_plot.insta_plot(clust_info, cores_queued, cores_running, resample_str='', fig_out='', y_label='Usage', fig_title='', query_bounds=True, running=[], queued=[], submit_run=[], submit_req=[], user_run=[])[source]

Instantaneous usage plot.

This function is deprecated as of v0.3.0.

Support continues in the ViewClust-Vis package.

Parameters:
  • clust_info (DataFrame) – Frame which represents the cluster state at given time intervals. See job_use.
  • cores_queued (array_like of DataFrame) – Series displaying queued resources at a particular time. See job_use.
  • cores_running (array_like of DataFrame) – Series displaying running resources at a particular time. See job_use.
  • resample_str (pandas freq str, optional) – Defaults to empty, meaning no resampling. Passing this parameter does not do sanity checking and will only run the below code example. cores_queued = cores_queued.resample(‘1D’).sum()
  • fig_out (str, optional) – Writes the generated figure to file as the given name. If empty, skips writing. Defaults to empty.
  • y_label (str, optional) – Makes the passed string the y-axis label.
  • fig_title (str, optional) – Appends the given string to the title.
  • query_bounds (bool, optional) – Draws red lines on the figure to represent where query is valid. Defaults to true.
  • running (DataFrame, optional) – Draws a green line representing the usage of jobs currently in RUNNING state if they run for the requested duration.
  • queued (DataFrame, optional) – Draws a gray line representing the usage of jobs currently in PENDING state if they were to start at query time and run for their requested duration.
  • submit_run (DataFrame, optional) – Draws a red line representing what would usage have looked like if jobs had started instantly and ran for their elapsed duration. Allows for easier interpretation of the queued series. Defaults to not plotting.
  • submit_req (DataFrame, optional) – Draws an orange line representing what usage would have looked like if jobs had started instantly and ran for their requested duration. Allows for easier interpretation of the queued series. Defaults to not plotting.

See also

jobUse()
Generates the input frames for this function.

viewclust.job_use module

viewclust.job_use.job_use(jobs, d_from, target, d_to='', use_unit='cpu', job_state='all', time_ref='', grouper_interval='S', usage_interval='H', serialize_queued='', serialize_running='', serialize_dist='')[source]
Takes a DataFrame full of job information and
returns usage based on specified unit.

This function operates as a stepping stone for plotting usage figures and returns various series and frames for several different uses.

Parameters:
  • jobs (DataFrame) – Job DataFrame typically generated by the ccmnt package.
  • use_unit (str, optional) – Usage unit to examine. One of: {‘cpu’, ‘cpu-eqv’, ‘gpu’, ‘gpu-eqv’,’gpu-eqv-cdr’}. Defaults to ‘cpu’.
  • job_state (str, optional) –
    The job state to include in measurement:
    {‘all’,’complete’, ‘running’, ‘queued’}.

    Defaults to ‘complete’.

  • time_ref (str, one of: {sub, req, sub+req}) – sub: Jobs run as if they ran at submit time. req: Jobs run their full requested time from their start time. sub+req: Jobs run their full requested time from their submit time. horizon+req: Jobs run their full requested time from the horizon (e.g. d_to).
  • insta_dur (str, optional) – The job duration used to calculate time_ref One of: {‘run’, ‘req’}. Defaults to ‘run’.
  • d_from (date str) – Beginning of the query period, e.g. ‘2019-04-01T00:00:00’.
  • target (int-like) – Typically a cpu allocation or core eqv value for a particular acount. Often 50.
  • d_to (date str, optional) – End of the query period, e.g. ‘2020-01-01T00:00:00’. Defaults to now if empty.
  • grouper_interval (str, optional) –

    The interval by which to calculate the start and end time cumulative sum difference. start and end steps within this interval will be ignored. Job record times occur at the second interval:

    {‘S’,’min’, ‘H’}.
  • usage_interval (str, optional) –

    The interval by which to store the usage series. Job record times occur at the second interval:

    {‘S’,’min’, ‘H’}.
  • debugging (boolean, optional) – Boolean for reporting progress to stdout. Default False.
  • serialize_queued, serialize_dist (serialize_running,) – Pickles given structure with argument as a name. If left empty, pickle procedure is skipped. Defaults to empty.
Returns:

  • clust – Frame of system info at given time intervals. Typically referenced by other functions for plotting information.
  • queued – Frame of queued resources
  • running – Frame of running resources
  • dist_from_target – Series for delta plots

viewclust.node_use module

viewclust.node_use.node_use(node_states, debugging=False)[source]

Calculate node usage statistics based on polling database.

Consider resampling data to get into hour format for easier plotting.

Returns:
  • cores_total – Series, total numbers of cores the scheduler is seeing at any point.
  • cores_perc – Series, weighted average (cpus in node) of alloc cpu over cfg cpu. Read as percentage of cpus in use.
  • mem_perc – Series, weighted average (cpus in node) of alloc mem over cfg mem. Read as percentage of memory in use.
  • max_perc – Series, weighted average (cpus in node)

viewclust.target_series module

viewclust.target_series.target_series(time_frames)[source]

Takes a list of tuples and builds a target based time series.

Parameters:time_frames (list, tuples of 3) –
List should be something of the following form:
# Q4: d_from = ‘2019-10-01T00:00:00’ d_dec = ‘2019-12-01T00:00:00’ d_to = ‘2019-12-31T00:00:00’

time_frames = [(d_from,d_dec,100),(d_dec,d_to,500)]

Returns:tar_frame – Time series by hour of the time_frames list. Based on above example:
2019-10-01 00:00:00 100 2019-10-01 01:00:00 100 2019-10-01 02:00:00 100 2019-10-01 03:00:00 100 2019-10-01 04:00:00 100 … … 2019-12-30 20:00:00 500 2019-12-30 21:00:00 500 2019-12-30 22:00:00 500 2019-12-30 23:00:00 500 2019-12-31 00:00:00 500
Return type:Pandas Series

See also

jobUse()
Generates the input frame for this function.

viewclust.to_terminal module

viewclust.to_terminal.to_terminal(series: Union[pandas.core.series.Series, List[pandas.core.series.Series]], title: str = 'resource usage', pu: str = 'cpu', labels: Optional[list] = None)[source]

Plot a datetime series (or a list of them) to terminal

Parameters:
  • series – A datetime series or a list of series to be plot
  • title – Title for the plot
  • pu – Processing using (GPU or CPU) for y axis
  • labels – If multiple series, the labels of each ome

Module contents