Bug 869414 - Reduced quantization resolution in scheduler stats causes sawtoothing
Reduced quantization resolution in scheduler stats causes sawtoothing
Status: CLOSED CURRENTRELEASE
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: condor (Show other bugs)
Development
All All
high Severity high
: 2.3
: ---
Assigned To: Erik Erlandson
Daniel Horák
: Rebase, Regression
Depends On: 867989
Blocks: 845292 876304
  Show dependency treegraph
 
Reported: 2012-10-23 15:54 EDT by Erik Erlandson
Modified: 2013-03-19 12:39 EDT (History)
9 users (show)

See Also:
Fixed In Version: condor-7.8.7-0.5
Doc Type: Rebase: Bug Fixes and Enhancements
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-03-19 12:39:32 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)
a screen shot of sawtoothing in cumin scheduler perf stats (31.44 KB, image/png)
2012-10-23 15:56 EDT, Erik Erlandson
no flags Details
Screenshot from reproduction sawtoothing in old version (27.77 KB, image/png)
2013-01-14 07:40 EST, Daniel Horák
no flags Details
Screenshot from testing on new version (21.15 KB, image/png)
2013-01-14 07:48 EST, Daniel Horák
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
Condor 3288 None None None 2012-10-23 16:13:10 EDT

  None (edit)
Description Erik Erlandson 2012-10-23 15:54:25 EDT
Description of problem:
Upstream modifications to statistics collection decreased the resolution of quantization in the data structures used to compute 'recent' statistics.  The result of the decreased resolution is visible 'sawtoothing' in recent statistics, e.g. in cumin schedd performance graphs


How reproducible:
100%


Steps to Reproduce:
1. run cumin with new stats in effect and view scheduler performance stats
  
Actual results:
Recent (non-cumulative) stats show saw-toothing

Expected results:
saw-toothing should be not be evident

Additional info:
I will attach a screen shot of saw-toothing as an example
Comment 1 Erik Erlandson 2012-10-23 15:56:38 EDT
Created attachment 632366 [details]
a screen shot of sawtoothing in cumin scheduler perf stats
Comment 2 Erik Erlandson 2012-10-23 16:00:16 EDT
Requesting 'no errata' as this was identified internally prior to release
Comment 4 Erik Erlandson 2012-10-24 10:52:17 EDT
Additional steps for repro:  you should point a cumin instance at a condor pool and maintain a steady state of submissions for 5-10 minutes.  The steady state of job submissions will produce a nice sawtooth in the repro, and the sawtooth should *not* appear when the fix for this ticket is in place.
Comment 5 Luigi Toscano 2012-10-30 08:42:01 EDT
(In reply to comment #4)
> Additional steps for repro:  you should point a cumin instance at a condor
> pool and maintain a steady state of submissions for 5-10 minutes.  The
> steady state of job submissions will produce a nice sawtooth in the repro,
> and the sawtooth should *not* appear when the fix for this ticket is in
> place.

In other words, does it mean that we should see the same behavior as before (apart from the way the graphs are implemented)?
Comment 6 Erik Erlandson 2012-11-01 19:26:51 EDT
(In reply to comment #5)
> (In reply to comment #4)
> > Additional steps for repro:  you should point a cumin instance at a condor
> > pool and maintain a steady state of submissions for 5-10 minutes.  The
> > steady state of job submissions will produce a nice sawtooth in the repro,
> > and the sawtooth should *not* appear when the fix for this ticket is in
> > place.
> 
> In other words, does it mean that we should see the same behavior as before
> (apart from the way the graphs are implemented)?

once this bug is fixed, then behavior should be as previous release
Comment 8 Erik Erlandson 2012-11-13 13:39:33 EST
pulled downstream:
UPSTREAM-7.9.3-BZ869414-stats-window-quantum
Comment 9 Erik Erlandson 2012-11-13 13:48:47 EST
The following is a 'non-cumin' command line test.

You can test the new feature, with visible 'saw-tooth' every 20 seconds using this configuration:

STATISTICS_TO_PUBLISH = SCHEDD:2 DC:2
# one minute window for Recent* stats
STATISTICS_WINDOW_SECONDS = 60
# 20-second ring buffer quantization - recent-stats will sawtooth every 20 secs
STATISTICS_WINDOW_QUANTUM = 20

Kick off a script that submits one job per second (or every couple seconds, as long as it's a regular interval << 20 seconds).

Saw-toothing is visible in tools like cumin, or I also tested using 'watch' on a 'recent' stat from "SCHEDD" collection and one from "DC" collection:

watch -n 5 'condor_status -l -schedd | grep -e RecentJobsSubmitted -e RecentDCSelectWaittime'

In the above, both of the statistics drop off every 20 seconds, then begin to grow again.

If you change STATISTICS_WINDOW_QUANTUM to '1', and restart the scheduler, then you will see both statistics reach steady-state values, with no major drop-offs or visible saw-toothing.
Comment 10 Daniel Horák 2013-01-14 07:40:54 EST
Created attachment 678213 [details]
Screenshot from reproduction sawtoothing in old version

Reproduced on RHEL6 x86_64 with following packages:
# rpm -qa | grep -e condor -e qpid -e cumin -e qmf | sort
  condor-7.8.7-0.4.el6.x86_64
  condor-aviary-7.8.7-0.4.el6.x86_64
  condor-classads-7.8.7-0.4.el6.x86_64
  condor-qmf-7.8.7-0.4.el6.x86_64
  cumin-0.1.5540-1.el6.noarch
  python-qpid-0.18-4.el6.noarch
  python-qpid-qmf-0.18-7.el6.x86_64
  qpid-cpp-client-0.18-9.el6.x86_64
  qpid-cpp-server-0.18-9.el6.x86_64
  qpid-qmf-0.18-7.el6.x86_64
  qpid-tools-0.18-6.el6.noarch
Comment 11 Daniel Horák 2013-01-14 07:48:24 EST
Created attachment 678214 [details]
Screenshot from testing on new version

Tested and verified on RHEL 5.9/6.4 i386/x86_64 with following packages:
# rpm -qa | grep -e condor -e qpid -e cumin -e qmf | sort
  condor-7.8.8-0.3.el5.i386
  condor-aviary-7.8.8-0.3.el5.i386
  condor-classads-7.8.8-0.3.el5.i386
  condor-qmf-7.8.8-0.3.el5.i386
  cumin-0.1.5648-1.el5.noarch
  python-qpid-0.18-4.el5.noarch
  python-qpid-qmf-0.18-13.el5.i386
  qpid-cpp-client-0.18-13.el5.i386
  qpid-cpp-server-0.18-13.el5.i386
  qpid-qmf-0.18-13.el5.i386
  qpid-tools-0.18-7.el5.noarch

Checked also scenario from comment 9.

>>> VERIFIED

Note You need to log in before you can comment on or make changes to this bug.