Bug 869414
| Summary: | Reduced quantization resolution in scheduler stats causes sawtoothing | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise MRG | Reporter: | Erik Erlandson <eerlands> | ||||||||
| Component: | condor | Assignee: | Erik Erlandson <eerlands> | ||||||||
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Daniel Horák <dahorak> | ||||||||
| Severity: | high | Docs Contact: | |||||||||
| Priority: | high | ||||||||||
| Version: | Development | CC: | dahorak, iboverma, ltoscano, matt, mkudlej, pmackinn, sgraf, tmckay, tstclair | ||||||||
| Target Milestone: | 2.3 | Keywords: | Rebase, Regression | ||||||||
| Target Release: | --- | ||||||||||
| Hardware: | All | ||||||||||
| OS: | All | ||||||||||
| Whiteboard: | |||||||||||
| Fixed In Version: | condor-7.8.7-0.5 | Doc Type: | Rebase: Bug Fixes and Enhancements | ||||||||
| Doc Text: | Story Points: | --- | |||||||||
| Clone Of: | Environment: | ||||||||||
| Last Closed: | 2013-03-19 16:39:32 UTC | Type: | Bug | ||||||||
| Regression: | --- | Mount Type: | --- | ||||||||
| Documentation: | --- | CRM: | |||||||||
| Verified Versions: | Category: | --- | |||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||
| Embargoed: | |||||||||||
| Bug Depends On: | 867989 | ||||||||||
| Bug Blocks: | 845292, 876304 | ||||||||||
| Attachments: |
|
||||||||||
|
Description
Erik Erlandson
2012-10-23 19:54:25 UTC
Created attachment 632366 [details]
a screen shot of sawtoothing in cumin scheduler perf stats
Requesting 'no errata' as this was identified internally prior to release Additional steps for repro: you should point a cumin instance at a condor pool and maintain a steady state of submissions for 5-10 minutes. The steady state of job submissions will produce a nice sawtooth in the repro, and the sawtooth should *not* appear when the fix for this ticket is in place. (In reply to comment #4) > Additional steps for repro: you should point a cumin instance at a condor > pool and maintain a steady state of submissions for 5-10 minutes. The > steady state of job submissions will produce a nice sawtooth in the repro, > and the sawtooth should *not* appear when the fix for this ticket is in > place. In other words, does it mean that we should see the same behavior as before (apart from the way the graphs are implemented)? (In reply to comment #5) > (In reply to comment #4) > > Additional steps for repro: you should point a cumin instance at a condor > > pool and maintain a steady state of submissions for 5-10 minutes. The > > steady state of job submissions will produce a nice sawtooth in the repro, > > and the sawtooth should *not* appear when the fix for this ticket is in > > place. > > In other words, does it mean that we should see the same behavior as before > (apart from the way the graphs are implemented)? once this bug is fixed, then behavior should be as previous release pulled downstream: UPSTREAM-7.9.3-BZ869414-stats-window-quantum The following is a 'non-cumin' command line test. You can test the new feature, with visible 'saw-tooth' every 20 seconds using this configuration: STATISTICS_TO_PUBLISH = SCHEDD:2 DC:2 # one minute window for Recent* stats STATISTICS_WINDOW_SECONDS = 60 # 20-second ring buffer quantization - recent-stats will sawtooth every 20 secs STATISTICS_WINDOW_QUANTUM = 20 Kick off a script that submits one job per second (or every couple seconds, as long as it's a regular interval << 20 seconds). Saw-toothing is visible in tools like cumin, or I also tested using 'watch' on a 'recent' stat from "SCHEDD" collection and one from "DC" collection: watch -n 5 'condor_status -l -schedd | grep -e RecentJobsSubmitted -e RecentDCSelectWaittime' In the above, both of the statistics drop off every 20 seconds, then begin to grow again. If you change STATISTICS_WINDOW_QUANTUM to '1', and restart the scheduler, then you will see both statistics reach steady-state values, with no major drop-offs or visible saw-toothing. Created attachment 678213 [details]
Screenshot from reproduction sawtoothing in old version
Reproduced on RHEL6 x86_64 with following packages:
# rpm -qa | grep -e condor -e qpid -e cumin -e qmf | sort
condor-7.8.7-0.4.el6.x86_64
condor-aviary-7.8.7-0.4.el6.x86_64
condor-classads-7.8.7-0.4.el6.x86_64
condor-qmf-7.8.7-0.4.el6.x86_64
cumin-0.1.5540-1.el6.noarch
python-qpid-0.18-4.el6.noarch
python-qpid-qmf-0.18-7.el6.x86_64
qpid-cpp-client-0.18-9.el6.x86_64
qpid-cpp-server-0.18-9.el6.x86_64
qpid-qmf-0.18-7.el6.x86_64
qpid-tools-0.18-6.el6.noarch
Created attachment 678214 [details] Screenshot from testing on new version Tested and verified on RHEL 5.9/6.4 i386/x86_64 with following packages: # rpm -qa | grep -e condor -e qpid -e cumin -e qmf | sort condor-7.8.8-0.3.el5.i386 condor-aviary-7.8.8-0.3.el5.i386 condor-classads-7.8.8-0.3.el5.i386 condor-qmf-7.8.8-0.3.el5.i386 cumin-0.1.5648-1.el5.noarch python-qpid-0.18-4.el5.noarch python-qpid-qmf-0.18-13.el5.i386 qpid-cpp-client-0.18-13.el5.i386 qpid-cpp-server-0.18-13.el5.i386 qpid-qmf-0.18-13.el5.i386 qpid-tools-0.18-7.el5.noarch Checked also scenario from comment 9. >>> VERIFIED |