Bug 1100766
| Summary: | watchman throttler's math is wrong | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Brenton Leanhardt <bleanhar> |
| Component: | Containers | Assignee: | Brenton Leanhardt <bleanhar> |
| Status: | CLOSED ERRATA | QA Contact: | libra bugs <libra-bugs> |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | 2.1.0 | CC: | adellape, agrimm, anli, jialiu, jkeck, jokerman, libra-onpremise-devel, mmccomas, xjia |
| Target Milestone: | --- | Keywords: | Upstream |
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | openshift-origin-node-util-1.22.11.1-1.el6op | Doc Type: | Bug Fix |
| Doc Text: |
In certain scenarios, gears were not properly throttled due to an issue in Watchman's ThrottlerPlugin. This bug fix addresses the issue in the plug-in, and CPU usage is now more accurately reflected as a result. A restart of the openshift-watchman service is required after applying this fix.
|
Story Points: | --- |
| Clone Of: | 1100518 | Environment: | |
| Last Closed: | 2014-08-04 13:27:12 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | 1100518 | ||
| Bug Blocks: | |||
|
Description
Brenton Leanhardt
2014-05-23 12:02:51 UTC
Upstream commits:
commit e214410228adb20034abbed0c9bedf7a99baa917
Author: Andy Grimm <agrimm>
Date: Thu May 22 22:44:07 2014 -0400
Bug 1100518 - Correct throttler's CPU usage math
commit 337d8994ec90f1fd82507f014e79d29e0205ece3
Author: Andy Grimm <agrimm>
Date: Sun May 25 14:53:08 2014 -0400
Move cgroup sample timestamp insertion and fix unit test
commit 32b245340f2eb8d58b3cc9ea47a012e3c200986c
Author: Andy Grimm <agrimm>
Date: Wed May 28 21:45:17 2014 -0400
Fix throttler math in monitored_gear_test
Verified and pass on puddle-2-1-2014-07-15
The bug can be recreated on OSE2.1Z GA build.
Recreated steps for Problem 1:
1. run the process in a gear that is compute-intensive. Here's one:
ruby -e 'z=0; 1.upto(1_000_000_000_000) { |x| 1.upto(x) { |y| z += y } }'
2. Watch the gear get throttled, and note that the logs say things like:
Jul 16 01:29:37 node watchman[1421]: Throttler: REFUSED restore => 53c4a1a64cfeffdd11000023 (still over threshold (980.512))
Jul 16 01:29:37 node watchman[1421]: Throttler: REFUSED restore => 53c5fe864cfeffdd11000038 (still over threshold (1016.848))
Jul 16 01:30:00 node watchman[1421]: Throttler: REFUSED restore => 53c4a1a64cfeffdd11000023 (still over threshold (979.731))
Jul 16 01:30:00 node watchman[1421]: Throttler: REFUSED restore => 53c5fe864cfeffdd11000038 (still over threshold (902.147))
Verified steps: update to puddle-2-1-2014-07-15 (hint:oo-cgroup-disable/enable all container).
1. run the process in a gear that is compute-intensive. Here's one:
ruby -e 'z=0; 1.upto(1_000_000_000_000) { |x| 1.upto(x) { |y| z += y } }'
2. Watch the gear get throttled, and note that the threshold is in normal status.
Jul 16 05:31:30 node watchman[24846]: Throttler: REFUSED restore => 53c5fe864cfeffdd11000038 (still over threshold (77.901))
Jul 16 05:31:51 node watchman[24846]: Throttler: REFUSED restore => 53c5fe864cfeffdd11000038 (still over threshold (95.201))
<snip--->
Jul 16 05:32:11 node watchman[24846]: Throttler: REFUSED restore => 53c5fe864cfeffdd11000038 (still over threshold (100.571))
Jul 16 05:32:13 node dhclient[1016]: bound to 192.168.55.38 -- renewal in 51 seconds.
Jul 16 05:32:31 node watchman[24846]: Throttler: REFUSED restore => 53c5fe864cfeffdd11000038 (still over threshold (100.556))
Recreated steps for Problem 2:
1. On a mostly-idle node, run a process in a gear that is compute-intensive, but in bursts. E.g.:
while true; do time ruby -e 'z=0; 1.upto(1000) { |x| 1.upto(x) { |y| z += y } }'; sleep 5; done
Jul 16 01:47:54 node watchman[28143]: Throttler: throttle => 53c5fe964cfeffdd1100004c (42.545)
Jul 16 01:47:54 node watchman[28143]: Throttler: throttle => 53c5fe964cfeffdd1100004c (42.545)
<snip--->
Jul 16 01:48:14 node watchman[28143]: Throttler: REFUSED restore => 53c5fe964cfeffdd1100004c (still over threshold (125.138))
Jul 16 01:48:35 node watchman[28143]: Throttler: REFUSED restore => 53c5fe964cfeffdd1100004c (still over threshold (207.461))
Verified with same steps after update puddle-2-1-2014-07-15,There isn't throttle and Throttler: REFUSED restore are reported against gear 53c5fe964cfeffdd1100004c
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2014-0999.html |