Bug 869414

Summary: Reduced quantization resolution in scheduler stats causes sawtoothing
Product: Red Hat Enterprise MRG Reporter: Erik Erlandson <eerlands>
Component: condorAssignee: Erik Erlandson <eerlands>
Status: CLOSED CURRENTRELEASE QA Contact: Daniel Horák <dahorak>
Severity: high Docs Contact:
Priority: high    
Version: DevelopmentCC: dahorak, iboverma, ltoscano, matt, mkudlej, pmackinn, sgraf, tmckay, tstclair
Target Milestone: 2.3Keywords: Rebase, Regression
Target Release: ---   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: condor-7.8.7-0.5 Doc Type: Rebase: Bug Fixes and Enhancements
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-03-19 12:39:32 EDT Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
Bug Depends On: 867989    
Bug Blocks: 845292, 876304    
Attachments:
Description Flags
a screen shot of sawtoothing in cumin scheduler perf stats
none
Screenshot from reproduction sawtoothing in old version
none
Screenshot from testing on new version none

Description Erik Erlandson 2012-10-23 15:54:25 EDT
Description of problem:
Upstream modifications to statistics collection decreased the resolution of quantization in the data structures used to compute 'recent' statistics.  The result of the decreased resolution is visible 'sawtoothing' in recent statistics, e.g. in cumin schedd performance graphs


How reproducible:
100%


Steps to Reproduce:
1. run cumin with new stats in effect and view scheduler performance stats
  
Actual results:
Recent (non-cumulative) stats show saw-toothing

Expected results:
saw-toothing should be not be evident

Additional info:
I will attach a screen shot of saw-toothing as an example
Comment 1 Erik Erlandson 2012-10-23 15:56:38 EDT
Created attachment 632366 [details]
a screen shot of sawtoothing in cumin scheduler perf stats
Comment 2 Erik Erlandson 2012-10-23 16:00:16 EDT
Requesting 'no errata' as this was identified internally prior to release
Comment 4 Erik Erlandson 2012-10-24 10:52:17 EDT
Additional steps for repro:  you should point a cumin instance at a condor pool and maintain a steady state of submissions for 5-10 minutes.  The steady state of job submissions will produce a nice sawtooth in the repro, and the sawtooth should *not* appear when the fix for this ticket is in place.
Comment 5 Luigi Toscano 2012-10-30 08:42:01 EDT
(In reply to comment #4)
> Additional steps for repro:  you should point a cumin instance at a condor
> pool and maintain a steady state of submissions for 5-10 minutes.  The
> steady state of job submissions will produce a nice sawtooth in the repro,
> and the sawtooth should *not* appear when the fix for this ticket is in
> place.

In other words, does it mean that we should see the same behavior as before (apart from the way the graphs are implemented)?
Comment 6 Erik Erlandson 2012-11-01 19:26:51 EDT
(In reply to comment #5)
> (In reply to comment #4)
> > Additional steps for repro:  you should point a cumin instance at a condor
> > pool and maintain a steady state of submissions for 5-10 minutes.  The
> > steady state of job submissions will produce a nice sawtooth in the repro,
> > and the sawtooth should *not* appear when the fix for this ticket is in
> > place.
> 
> In other words, does it mean that we should see the same behavior as before
> (apart from the way the graphs are implemented)?

once this bug is fixed, then behavior should be as previous release
Comment 8 Erik Erlandson 2012-11-13 13:39:33 EST
pulled downstream:
UPSTREAM-7.9.3-BZ869414-stats-window-quantum
Comment 9 Erik Erlandson 2012-11-13 13:48:47 EST
The following is a 'non-cumin' command line test.

You can test the new feature, with visible 'saw-tooth' every 20 seconds using this configuration:

STATISTICS_TO_PUBLISH = SCHEDD:2 DC:2
# one minute window for Recent* stats
STATISTICS_WINDOW_SECONDS = 60
# 20-second ring buffer quantization - recent-stats will sawtooth every 20 secs
STATISTICS_WINDOW_QUANTUM = 20

Kick off a script that submits one job per second (or every couple seconds, as long as it's a regular interval << 20 seconds).

Saw-toothing is visible in tools like cumin, or I also tested using 'watch' on a 'recent' stat from "SCHEDD" collection and one from "DC" collection:

watch -n 5 'condor_status -l -schedd | grep -e RecentJobsSubmitted -e RecentDCSelectWaittime'

In the above, both of the statistics drop off every 20 seconds, then begin to grow again.

If you change STATISTICS_WINDOW_QUANTUM to '1', and restart the scheduler, then you will see both statistics reach steady-state values, with no major drop-offs or visible saw-toothing.
Comment 10 Daniel Horák 2013-01-14 07:40:54 EST
Created attachment 678213 [details]
Screenshot from reproduction sawtoothing in old version

Reproduced on RHEL6 x86_64 with following packages:
# rpm -qa | grep -e condor -e qpid -e cumin -e qmf | sort
  condor-7.8.7-0.4.el6.x86_64
  condor-aviary-7.8.7-0.4.el6.x86_64
  condor-classads-7.8.7-0.4.el6.x86_64
  condor-qmf-7.8.7-0.4.el6.x86_64
  cumin-0.1.5540-1.el6.noarch
  python-qpid-0.18-4.el6.noarch
  python-qpid-qmf-0.18-7.el6.x86_64
  qpid-cpp-client-0.18-9.el6.x86_64
  qpid-cpp-server-0.18-9.el6.x86_64
  qpid-qmf-0.18-7.el6.x86_64
  qpid-tools-0.18-6.el6.noarch
Comment 11 Daniel Horák 2013-01-14 07:48:24 EST
Created attachment 678214 [details]
Screenshot from testing on new version

Tested and verified on RHEL 5.9/6.4 i386/x86_64 with following packages:
# rpm -qa | grep -e condor -e qpid -e cumin -e qmf | sort
  condor-7.8.8-0.3.el5.i386
  condor-aviary-7.8.8-0.3.el5.i386
  condor-classads-7.8.8-0.3.el5.i386
  condor-qmf-7.8.8-0.3.el5.i386
  cumin-0.1.5648-1.el5.noarch
  python-qpid-0.18-4.el5.noarch
  python-qpid-qmf-0.18-13.el5.i386
  qpid-cpp-client-0.18-13.el5.i386
  qpid-cpp-server-0.18-13.el5.i386
  qpid-qmf-0.18-13.el5.i386
  qpid-tools-0.18-7.el5.noarch

Checked also scenario from comment 9.

>>> VERIFIED