Bug 869414 - Reduced quantization resolution in scheduler stats causes sawtoothing
Summary: Reduced quantization resolution in scheduler stats causes sawtoothing
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: condor
Version: Development
Hardware: All
OS: All
high
high
Target Milestone: 2.3
: ---
Assignee: Erik Erlandson
QA Contact: Daniel Horák
URL:
Whiteboard:
Depends On: 867989
Blocks: 845292 876304
TreeView+ depends on / blocked
 
Reported: 2012-10-23 19:54 UTC by Erik Erlandson
Modified: 2013-03-19 16:39 UTC (History)
9 users (show)

Fixed In Version: condor-7.8.7-0.5
Doc Type: Rebase: Bug Fixes and Enhancements
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-03-19 16:39:32 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
a screen shot of sawtoothing in cumin scheduler perf stats (31.44 KB, image/png)
2012-10-23 19:56 UTC, Erik Erlandson
no flags Details
Screenshot from reproduction sawtoothing in old version (27.77 KB, image/png)
2013-01-14 12:40 UTC, Daniel Horák
no flags Details
Screenshot from testing on new version (21.15 KB, image/png)
2013-01-14 12:48 UTC, Daniel Horák
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Condor 3288 0 None None None 2012-10-23 20:13:10 UTC

Description Erik Erlandson 2012-10-23 19:54:25 UTC
Description of problem:
Upstream modifications to statistics collection decreased the resolution of quantization in the data structures used to compute 'recent' statistics.  The result of the decreased resolution is visible 'sawtoothing' in recent statistics, e.g. in cumin schedd performance graphs


How reproducible:
100%


Steps to Reproduce:
1. run cumin with new stats in effect and view scheduler performance stats
  
Actual results:
Recent (non-cumulative) stats show saw-toothing

Expected results:
saw-toothing should be not be evident

Additional info:
I will attach a screen shot of saw-toothing as an example

Comment 1 Erik Erlandson 2012-10-23 19:56:38 UTC
Created attachment 632366 [details]
a screen shot of sawtoothing in cumin scheduler perf stats

Comment 2 Erik Erlandson 2012-10-23 20:00:16 UTC
Requesting 'no errata' as this was identified internally prior to release

Comment 4 Erik Erlandson 2012-10-24 14:52:17 UTC
Additional steps for repro:  you should point a cumin instance at a condor pool and maintain a steady state of submissions for 5-10 minutes.  The steady state of job submissions will produce a nice sawtooth in the repro, and the sawtooth should *not* appear when the fix for this ticket is in place.

Comment 5 Luigi Toscano 2012-10-30 12:42:01 UTC
(In reply to comment #4)
> Additional steps for repro:  you should point a cumin instance at a condor
> pool and maintain a steady state of submissions for 5-10 minutes.  The
> steady state of job submissions will produce a nice sawtooth in the repro,
> and the sawtooth should *not* appear when the fix for this ticket is in
> place.

In other words, does it mean that we should see the same behavior as before (apart from the way the graphs are implemented)?

Comment 6 Erik Erlandson 2012-11-01 23:26:51 UTC
(In reply to comment #5)
> (In reply to comment #4)
> > Additional steps for repro:  you should point a cumin instance at a condor
> > pool and maintain a steady state of submissions for 5-10 minutes.  The
> > steady state of job submissions will produce a nice sawtooth in the repro,
> > and the sawtooth should *not* appear when the fix for this ticket is in
> > place.
> 
> In other words, does it mean that we should see the same behavior as before
> (apart from the way the graphs are implemented)?

once this bug is fixed, then behavior should be as previous release

Comment 8 Erik Erlandson 2012-11-13 18:39:33 UTC
pulled downstream:
UPSTREAM-7.9.3-BZ869414-stats-window-quantum

Comment 9 Erik Erlandson 2012-11-13 18:48:47 UTC
The following is a 'non-cumin' command line test.

You can test the new feature, with visible 'saw-tooth' every 20 seconds using this configuration:

STATISTICS_TO_PUBLISH = SCHEDD:2 DC:2
# one minute window for Recent* stats
STATISTICS_WINDOW_SECONDS = 60
# 20-second ring buffer quantization - recent-stats will sawtooth every 20 secs
STATISTICS_WINDOW_QUANTUM = 20

Kick off a script that submits one job per second (or every couple seconds, as long as it's a regular interval << 20 seconds).

Saw-toothing is visible in tools like cumin, or I also tested using 'watch' on a 'recent' stat from "SCHEDD" collection and one from "DC" collection:

watch -n 5 'condor_status -l -schedd | grep -e RecentJobsSubmitted -e RecentDCSelectWaittime'

In the above, both of the statistics drop off every 20 seconds, then begin to grow again.

If you change STATISTICS_WINDOW_QUANTUM to '1', and restart the scheduler, then you will see both statistics reach steady-state values, with no major drop-offs or visible saw-toothing.

Comment 10 Daniel Horák 2013-01-14 12:40:54 UTC
Created attachment 678213 [details]
Screenshot from reproduction sawtoothing in old version

Reproduced on RHEL6 x86_64 with following packages:
# rpm -qa | grep -e condor -e qpid -e cumin -e qmf | sort
  condor-7.8.7-0.4.el6.x86_64
  condor-aviary-7.8.7-0.4.el6.x86_64
  condor-classads-7.8.7-0.4.el6.x86_64
  condor-qmf-7.8.7-0.4.el6.x86_64
  cumin-0.1.5540-1.el6.noarch
  python-qpid-0.18-4.el6.noarch
  python-qpid-qmf-0.18-7.el6.x86_64
  qpid-cpp-client-0.18-9.el6.x86_64
  qpid-cpp-server-0.18-9.el6.x86_64
  qpid-qmf-0.18-7.el6.x86_64
  qpid-tools-0.18-6.el6.noarch

Comment 11 Daniel Horák 2013-01-14 12:48:24 UTC
Created attachment 678214 [details]
Screenshot from testing on new version

Tested and verified on RHEL 5.9/6.4 i386/x86_64 with following packages:
# rpm -qa | grep -e condor -e qpid -e cumin -e qmf | sort
  condor-7.8.8-0.3.el5.i386
  condor-aviary-7.8.8-0.3.el5.i386
  condor-classads-7.8.8-0.3.el5.i386
  condor-qmf-7.8.8-0.3.el5.i386
  cumin-0.1.5648-1.el5.noarch
  python-qpid-0.18-4.el5.noarch
  python-qpid-qmf-0.18-13.el5.i386
  qpid-cpp-client-0.18-13.el5.i386
  qpid-cpp-server-0.18-13.el5.i386
  qpid-qmf-0.18-13.el5.i386
  qpid-tools-0.18-7.el5.noarch

Checked also scenario from comment 9.

>>> VERIFIED


Note You need to log in before you can comment on or make changes to this bug.