Bug 1018342

Summary: Throttler is incorrectly throttling
Product: OpenShift Online Reporter: Mike McGrath <mmcgrath>
Component: ContainersAssignee: Jhon Honce <jhonce>
Status: CLOSED CURRENTRELEASE QA Contact: libra bugs <libra-bugs>
Severity: medium Docs Contact:
Priority: medium    
Version: 2.xCC: bmeng, dmace, dmcphers, gideon, jkeck, nduong
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-02-26 19:07:58 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On: 1062573    
Bug Blocks:    

Description Mike McGrath 2013-10-11 17:55:59 UTC
Report came in from a thread:

https://www.openshift.com/forums/openshift/poor-performance-throughput-response-time-apdex-score-in-comparison-to-heroku#comment-34960

The user is throttled (which makes sense if he was load testing).  I did some load tests on this gear as well.  The problem is it's never un-throttling:

Oct 11 13:49:46 ex-std-nodeXX rhc-watchman[24473]: Throttler: REFUSED restore => XXXXXXXXXXXXXXXXXXXXXXXX (still over threshold (104.731))

The threshold number is going up and down.  During that time the gear had a rails app but wasn't serving requests (in fact the most CPU was being detected from running top).

When setting it to a default template, cgroups continued to throw messages like that, though there was also this odd one:

Oct 11 13:50:48 ex-std-nodeXXX rhc-watchman[24473]: Throttler: REFUSED restore => XXXXXXXXXXXXXXXXXXXXXXXXX (still over threshold (84.362))
Oct 11 13:51:51 ex-std-nodeXXX rhc-watchman[24473]: Throttler: restore => XXXXXXXXXXXXXXXXXXXXXXXX (44.199)
Oct 11 13:51:51 ex-std-nodeXXX rhc-watchman[24473]: Throttler: throttle => XXXXXXXXXXXXXXXXXXXXX (44.199)
Oct 11 13:52:53 ex-std-nodeXXX rhc-watchman[24473]: Throttler: REFUSED restore => XXXXXXXXXXXXXXXXXX (still over threshold (70.621))

Notice how the restore and throttle there happened at the same time.

Comment 1 Dan Mace 2013-12-16 16:53:56 UTC
Reassigning this to the node team.

Comment 2 openshift-github-bot 2014-01-31 20:56:50 UTC
Commit pushed to master at https://github.com/openshift/origin-server

https://github.com/openshift/origin-server/commit/94d0f7f85cb675eb152b53c1e3833c7a4aa92a0c
Bug 1018342 - Stop restore/throttle flapping

Comment 3 Meng Bo 2014-02-11 06:25:24 UTC
Tried on devenv_4357 several times, gear which restored will not be throttled when the cpu usage is down.

Feb 11 01:06:28 ip-10-16-155-161 watchman[1969]: Throttler: restore => 353849463692895929237504 (48.113)

Feb 11 01:19:10 ip-10-16-155-161 watchman[1969]: Throttler: restore => 52f99837c7ca5d728c00003e (58.398)

Feb 11 01:22:10 ip-10-16-155-161 watchman[1969]: Throttler: restore => 52f99837c7ca5d728c00003e (39.226)

Move bug to verified.