Bug 999837 - Throttler will keep trying to restore the throttled gears even if they had been deleted from node
Summary: Throttler will keep trying to restore the throttled gears even if they had be...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: OpenShift Online
Classification: Red Hat
Component: Containers
Version: 2.x
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: ---
Assignee: Fotios Lindiakos
QA Contact: libra bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-08-22 08:53 UTC by Meng Bo
Modified: 2015-05-14 23:26 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-08-29 12:54:36 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Meng Bo 2013-08-22 08:53:22 UTC
Description of problem:
Make a gear in to throttle cgroup status. Then delete the app from rhc client. Watch the /var/log/messages, we can find the rhc-watchman will keep trying to restore the gear unless do a libra-watchman restart.


Version-Release number of selected component (if applicable):
devenv-stage_448

How reproducible:
always

Steps to Reproduce:
1.Create app
2.Make the app into throttle cgroup setting with some shell command
> dd if=/dev/zero of=/dev/null &
3.Delete the app from rhc client
4.Check the log of rhc-watchman


Actual results:
It will keep trying to restore the throttled gear even if the gear was already deleted.

Expected results:
It should ignore the deleted gears after some times trying.

Additional info:
From the log, we can find that it has been trying to restore the non-exist gear for over 10 minutes.

Aug 22 04:08:49 ip-10-196-51-239 rhc-watchman[27526]: Running rhc-watchman => delay: 20s, exception threshold: 10
Aug 22 04:08:49 ip-10-196-51-239 rhc-watchman[27526]: Throttler: REFUSED restore => 25989dea0afc11e3bbd012313b083001 (unknown utilization)
Aug 22 04:09:09 ip-10-196-51-239 rhc-watchman[27526]: Running rhc-watchman => delay: 20s, exception threshold: 10
Aug 22 04:09:09 ip-10-196-51-239 rhc-watchman[27526]: Throttler: REFUSED restore => 25989dea0afc11e3bbd012313b083001 (unknown utilization)
Aug 22 04:09:29 ip-10-196-51-239 rhc-watchman[27526]: Running rhc-watchman => delay: 20s, exception threshold: 10
Aug 22 04:09:29 ip-10-196-51-239 rhc-watchman[27526]: Throttler: REFUSED restore => 25989dea0afc11e3bbd012313b083001 (unknown utilization)
Aug 22 04:09:49 ip-10-196-51-239 rhc-watchman[27526]: Running rhc-watchman => delay: 20s, exception threshold: 10
Aug 22 04:09:49 ip-10-196-51-239 rhc-watchman[27526]: Throttler: REFUSED restore => 25989dea0afc11e3bbd012313b083001 (unknown utilization)
Aug 22 04:10:09 ip-10-196-51-239 rhc-watchman[27526]: Running rhc-watchman => delay: 20s, exception threshold: 10
Aug 22 04:10:09 ip-10-196-51-239 rhc-watchman[27526]: Throttler: REFUSED restore => 25989dea0afc11e3bbd012313b083001 (unknown utilization)
Aug 22 04:10:29 ip-10-196-51-239 rhc-watchman[27526]: Running rhc-watchman => delay: 20s, exception threshold: 10
Aug 22 04:10:29 ip-10-196-51-239 rhc-watchman[27526]: Throttler: REFUSED restore => 25989dea0afc11e3bbd012313b083001 (unknown utilization)
Aug 22 04:10:49 ip-10-196-51-239 rhc-watchman[27526]: Running rhc-watchman => delay: 20s, exception threshold: 10
Aug 22 04:10:49 ip-10-196-51-239 rhc-watchman[27526]: Throttler: REFUSED restore => 25989dea0afc11e3bbd012313b083001 (unknown utilization)
Aug 22 04:11:09 ip-10-196-51-239 rhc-watchman[27526]: Running rhc-watchman => delay: 20s, exception threshold: 10
Aug 22 04:11:09 ip-10-196-51-239 rhc-watchman[27526]: Throttler: REFUSED restore => 25989dea0afc11e3bbd012313b083001 (unknown utilization)
Aug 22 04:11:29 ip-10-196-51-239 rhc-watchman[27526]: Running rhc-watchman => delay: 20s, exception threshold: 10
Aug 22 04:11:29 ip-10-196-51-239 rhc-watchman[27526]: Throttler: REFUSED restore => 25989dea0afc11e3bbd012313b083001 (unknown utilization)
Aug 22 04:11:49 ip-10-196-51-239 rhc-watchman[27526]: Running rhc-watchman => delay: 20s, exception threshold: 10
Aug 22 04:11:49 ip-10-196-51-239 rhc-watchman[27526]: Throttler: REFUSED restore => 25989dea0afc11e3bbd012313b083001 (unknown utilization)
Aug 22 04:12:09 ip-10-196-51-239 rhc-watchman[27526]: Running rhc-watchman => delay: 20s, exception threshold: 10
Aug 22 04:12:09 ip-10-196-51-239 rhc-watchman[27526]: Throttler: REFUSED restore => 25989dea0afc11e3bbd012313b083001 (unknown utilization)
Aug 22 04:12:29 ip-10-196-51-239 rhc-watchman[27526]: Running rhc-watchman => delay: 20s, exception threshold: 10
Aug 22 04:12:29 ip-10-196-51-239 rhc-watchman[27526]: Throttler: REFUSED restore => 25989dea0afc11e3bbd012313b083001 (unknown utilization)
Aug 22 04:12:49 ip-10-196-51-239 rhc-watchman[27526]: Running rhc-watchman => delay: 20s, exception threshold: 10
Aug 22 04:12:49 ip-10-196-51-239 rhc-watchman[27526]: Throttler: REFUSED restore => 25989dea0afc11e3bbd012313b083001 (unknown utilization)
Aug 22 04:13:09 ip-10-196-51-239 rhc-watchman[27526]: Running rhc-watchman => delay: 20s, exception threshold: 10
Aug 22 04:13:09 ip-10-196-51-239 rhc-watchman[27526]: Throttler: REFUSED restore => 25989dea0afc11e3bbd012313b083001 (unknown utilization)
Aug 22 04:13:29 ip-10-196-51-239 rhc-watchman[27526]: Running rhc-watchman => delay: 20s, exception threshold: 10
Aug 22 04:13:29 ip-10-196-51-239 rhc-watchman[27526]: Throttler: REFUSED restore => 25989dea0afc11e3bbd012313b083001 (unknown utilization)
Aug 22 04:13:49 ip-10-196-51-239 rhc-watchman[27526]: Running rhc-watchman => delay: 20s, exception threshold: 10
Aug 22 04:13:49 ip-10-196-51-239 rhc-watchman[27526]: Throttler: REFUSED restore => 25989dea0afc11e3bbd012313b083001 (unknown utilization)
Aug 22 04:14:09 ip-10-196-51-239 rhc-watchman[27526]: Running rhc-watchman => delay: 20s, exception threshold: 10
Aug 22 04:14:09 ip-10-196-51-239 rhc-watchman[27526]: Throttler: REFUSED restore => 25989dea0afc11e3bbd012313b083001 (unknown utilization)
Aug 22 04:14:29 ip-10-196-51-239 rhc-watchman[27526]: Running rhc-watchman => delay: 20s, exception threshold: 10
Aug 22 04:14:29 ip-10-196-51-239 rhc-watchman[27526]: Throttler: REFUSED restore => 25989dea0afc11e3bbd012313b083001 (unknown utilization)
Aug 22 04:14:49 ip-10-196-51-239 rhc-watchman[27526]: Running rhc-watchman => delay: 20s, exception threshold: 10
Aug 22 04:14:49 ip-10-196-51-239 rhc-watchman[27526]: Throttler: REFUSED restore => 25989dea0afc11e3bbd012313b083001 (unknown utilization)
Aug 22 04:15:09 ip-10-196-51-239 rhc-watchman[27526]: Running rhc-watchman => delay: 20s, exception threshold: 10
Aug 22 04:15:09 ip-10-196-51-239 rhc-watchman[27526]: Throttler: REFUSED restore => 25989dea0afc11e3bbd012313b083001 (unknown utilization)
Aug 22 04:15:29 ip-10-196-51-239 rhc-watchman[27526]: Running rhc-watchman => delay: 20s, exception threshold: 10
Aug 22 04:15:29 ip-10-196-51-239 rhc-watchman[27526]: Throttler: REFUSED restore => 25989dea0afc11e3bbd012313b083001 (unknown utilization)
Aug 22 04:15:49 ip-10-196-51-239 rhc-watchman[27526]: Running rhc-watchman => delay: 20s, exception threshold: 10
Aug 22 04:15:49 ip-10-196-51-239 rhc-watchman[27526]: Throttler: REFUSED restore => 25989dea0afc11e3bbd012313b083001 (unknown utilization)
Aug 22 04:16:09 ip-10-196-51-239 rhc-watchman[27526]: Running rhc-watchman => delay: 20s, exception threshold: 10
Aug 22 04:16:09 ip-10-196-51-239 rhc-watchman[27526]: Throttler: REFUSED restore => 25989dea0afc11e3bbd012313b083001 (unknown utilization)
Aug 22 04:16:29 ip-10-196-51-239 rhc-watchman[27526]: Running rhc-watchman => delay: 20s, exception threshold: 10
Aug 22 04:16:29 ip-10-196-51-239 rhc-watchman[27526]: Throttler: REFUSED restore => 25989dea0afc11e3bbd012313b083001 (unknown utilization)
Aug 22 04:16:49 ip-10-196-51-239 rhc-watchman[27526]: Running rhc-watchman => delay: 20s, exception threshold: 10
Aug 22 04:16:49 ip-10-196-51-239 rhc-watchman[27526]: Throttler: REFUSED restore => 25989dea0afc11e3bbd012313b083001 (unknown utilization)
Aug 22 04:17:09 ip-10-196-51-239 rhc-watchman[27526]: Running rhc-watchman => delay: 20s, exception threshold: 10
Aug 22 04:17:09 ip-10-196-51-239 rhc-watchman[27526]: Throttler: REFUSED restore => 25989dea0afc11e3bbd012313b083001 (unknown utilization)
Aug 22 04:17:29 ip-10-196-51-239 rhc-watchman[27526]: Running rhc-watchman => delay: 20s, exception threshold: 10
Aug 22 04:17:29 ip-10-196-51-239 rhc-watchman[27526]: Throttler: REFUSED restore => 25989dea0afc11e3bbd012313b083001 (unknown utilization)
Aug 22 04:17:49 ip-10-196-51-239 rhc-watchman[27526]: Running rhc-watchman => delay: 20s, exception threshold: 10
Aug 22 04:17:49 ip-10-196-51-239 rhc-watchman[27526]: Throttler: REFUSED restore => 25989dea0afc11e3bbd012313b083001 (unknown utilization)

Comment 1 Michal Fojtik 2013-08-22 13:15:23 UTC
It seems like a caching issue to me. Will check

Comment 2 Fotios Lindiakos 2013-08-22 19:13:53 UTC
We were only removing values from the running_apps hash, but we needed to also remove them from the previously throttled hash.

Updated in this PR: https://github.com/openshift/origin-server/pull/3474

Comment 3 openshift-github-bot 2013-08-23 00:16:44 UTC
Commit pushed to master at https://github.com/openshift/origin-server

https://github.com/openshift/origin-server/commit/b9d0e377ed03c4b686b9c285008923203a2a2c71
Merge pull request #3474 from fotioslindiakos/Bug999837

Merged by openshift-bot

Comment 4 Meng Bo 2013-08-23 09:26:00 UTC
Checked on devenv-stage_452, issue has been fixed.

Aug 23 05:22:01 ip-10-118-14-174 rhc-watchman[1927]: Running rhc-watchman => delay: 20s, exception threshold: 10
Aug 23 05:22:01 ip-10-118-14-174 rhc-watchman[1927]: Throttler: throttle => a323e9300bd411e3a65012313d080160 (127.325)
Aug 23 05:22:21 ip-10-118-14-174 rhc-watchman[1927]: Running rhc-watchman => delay: 20s, exception threshold: 10
Aug 23 05:22:21 ip-10-118-14-174 rhc-watchman[1927]: Throttler: REFUSED restore => a323e9300bd411e3a65012313d080160 (still over threshold (378.562))
Aug 23 05:22:39 ip-10-118-14-174 CGRE[1016]: Reloading rules configuration
Aug 23 05:22:41 ip-10-118-14-174 rhc-watchman[1927]: Running rhc-watchman => delay: 20s, exception threshold: 10
Aug 23 05:23:01 ip-10-118-14-174 rhc-watchman[1927]: Running rhc-watchman => delay: 20s, exception threshold: 10
Aug 23 05:23:21 ip-10-118-14-174 rhc-watchman[1927]: Running rhc-watchman => delay: 20s, exception threshold: 10
Aug 23 05:23:41 ip-10-118-14-174 rhc-watchman[1927]: Running rhc-watchman => delay: 20s, exception threshold: 10


Cgroup will reload configuration rules after app being deleted.

Move bug to verified.


Note You need to log in before you can comment on or make changes to this bug.