Description of problem: Upstream modifications to statistics collection decreased the resolution of quantization in the data structures used to compute 'recent' statistics. The result of the decreased resolution is visible 'sawtoothing' in recent statistics, e.g. in cumin schedd performance graphs How reproducible: 100% Steps to Reproduce: 1. run cumin with new stats in effect and view scheduler performance stats Actual results: Recent (non-cumulative) stats show saw-toothing Expected results: saw-toothing should be not be evident Additional info: I will attach a screen shot of saw-toothing as an example
Created attachment 632366 [details] a screen shot of sawtoothing in cumin scheduler perf stats
Requesting 'no errata' as this was identified internally prior to release
Additional steps for repro: you should point a cumin instance at a condor pool and maintain a steady state of submissions for 5-10 minutes. The steady state of job submissions will produce a nice sawtooth in the repro, and the sawtooth should *not* appear when the fix for this ticket is in place.
(In reply to comment #4) > Additional steps for repro: you should point a cumin instance at a condor > pool and maintain a steady state of submissions for 5-10 minutes. The > steady state of job submissions will produce a nice sawtooth in the repro, > and the sawtooth should *not* appear when the fix for this ticket is in > place. In other words, does it mean that we should see the same behavior as before (apart from the way the graphs are implemented)?
(In reply to comment #5) > (In reply to comment #4) > > Additional steps for repro: you should point a cumin instance at a condor > > pool and maintain a steady state of submissions for 5-10 minutes. The > > steady state of job submissions will produce a nice sawtooth in the repro, > > and the sawtooth should *not* appear when the fix for this ticket is in > > place. > > In other words, does it mean that we should see the same behavior as before > (apart from the way the graphs are implemented)? once this bug is fixed, then behavior should be as previous release
pulled downstream: UPSTREAM-7.9.3-BZ869414-stats-window-quantum
The following is a 'non-cumin' command line test. You can test the new feature, with visible 'saw-tooth' every 20 seconds using this configuration: STATISTICS_TO_PUBLISH = SCHEDD:2 DC:2 # one minute window for Recent* stats STATISTICS_WINDOW_SECONDS = 60 # 20-second ring buffer quantization - recent-stats will sawtooth every 20 secs STATISTICS_WINDOW_QUANTUM = 20 Kick off a script that submits one job per second (or every couple seconds, as long as it's a regular interval << 20 seconds). Saw-toothing is visible in tools like cumin, or I also tested using 'watch' on a 'recent' stat from "SCHEDD" collection and one from "DC" collection: watch -n 5 'condor_status -l -schedd | grep -e RecentJobsSubmitted -e RecentDCSelectWaittime' In the above, both of the statistics drop off every 20 seconds, then begin to grow again. If you change STATISTICS_WINDOW_QUANTUM to '1', and restart the scheduler, then you will see both statistics reach steady-state values, with no major drop-offs or visible saw-toothing.
Created attachment 678213 [details] Screenshot from reproduction sawtoothing in old version Reproduced on RHEL6 x86_64 with following packages: # rpm -qa | grep -e condor -e qpid -e cumin -e qmf | sort condor-7.8.7-0.4.el6.x86_64 condor-aviary-7.8.7-0.4.el6.x86_64 condor-classads-7.8.7-0.4.el6.x86_64 condor-qmf-7.8.7-0.4.el6.x86_64 cumin-0.1.5540-1.el6.noarch python-qpid-0.18-4.el6.noarch python-qpid-qmf-0.18-7.el6.x86_64 qpid-cpp-client-0.18-9.el6.x86_64 qpid-cpp-server-0.18-9.el6.x86_64 qpid-qmf-0.18-7.el6.x86_64 qpid-tools-0.18-6.el6.noarch
Created attachment 678214 [details] Screenshot from testing on new version Tested and verified on RHEL 5.9/6.4 i386/x86_64 with following packages: # rpm -qa | grep -e condor -e qpid -e cumin -e qmf | sort condor-7.8.8-0.3.el5.i386 condor-aviary-7.8.8-0.3.el5.i386 condor-classads-7.8.8-0.3.el5.i386 condor-qmf-7.8.8-0.3.el5.i386 cumin-0.1.5648-1.el5.noarch python-qpid-0.18-4.el5.noarch python-qpid-qmf-0.18-13.el5.i386 qpid-cpp-client-0.18-13.el5.i386 qpid-cpp-server-0.18-13.el5.i386 qpid-qmf-0.18-13.el5.i386 qpid-tools-0.18-7.el5.noarch Checked also scenario from comment 9. >>> VERIFIED