Description of problem: Make a gear in to throttle cgroup status. Then delete the app from rhc client. Watch the /var/log/messages, we can find the rhc-watchman will keep trying to restore the gear unless do a libra-watchman restart. Version-Release number of selected component (if applicable): devenv-stage_448 How reproducible: always Steps to Reproduce: 1.Create app 2.Make the app into throttle cgroup setting with some shell command > dd if=/dev/zero of=/dev/null & 3.Delete the app from rhc client 4.Check the log of rhc-watchman Actual results: It will keep trying to restore the throttled gear even if the gear was already deleted. Expected results: It should ignore the deleted gears after some times trying. Additional info: From the log, we can find that it has been trying to restore the non-exist gear for over 10 minutes. Aug 22 04:08:49 ip-10-196-51-239 rhc-watchman[27526]: Running rhc-watchman => delay: 20s, exception threshold: 10 Aug 22 04:08:49 ip-10-196-51-239 rhc-watchman[27526]: Throttler: REFUSED restore => 25989dea0afc11e3bbd012313b083001 (unknown utilization) Aug 22 04:09:09 ip-10-196-51-239 rhc-watchman[27526]: Running rhc-watchman => delay: 20s, exception threshold: 10 Aug 22 04:09:09 ip-10-196-51-239 rhc-watchman[27526]: Throttler: REFUSED restore => 25989dea0afc11e3bbd012313b083001 (unknown utilization) Aug 22 04:09:29 ip-10-196-51-239 rhc-watchman[27526]: Running rhc-watchman => delay: 20s, exception threshold: 10 Aug 22 04:09:29 ip-10-196-51-239 rhc-watchman[27526]: Throttler: REFUSED restore => 25989dea0afc11e3bbd012313b083001 (unknown utilization) Aug 22 04:09:49 ip-10-196-51-239 rhc-watchman[27526]: Running rhc-watchman => delay: 20s, exception threshold: 10 Aug 22 04:09:49 ip-10-196-51-239 rhc-watchman[27526]: Throttler: REFUSED restore => 25989dea0afc11e3bbd012313b083001 (unknown utilization) Aug 22 04:10:09 ip-10-196-51-239 rhc-watchman[27526]: Running rhc-watchman => delay: 20s, exception threshold: 10 Aug 22 04:10:09 ip-10-196-51-239 rhc-watchman[27526]: Throttler: REFUSED restore => 25989dea0afc11e3bbd012313b083001 (unknown utilization) Aug 22 04:10:29 ip-10-196-51-239 rhc-watchman[27526]: Running rhc-watchman => delay: 20s, exception threshold: 10 Aug 22 04:10:29 ip-10-196-51-239 rhc-watchman[27526]: Throttler: REFUSED restore => 25989dea0afc11e3bbd012313b083001 (unknown utilization) Aug 22 04:10:49 ip-10-196-51-239 rhc-watchman[27526]: Running rhc-watchman => delay: 20s, exception threshold: 10 Aug 22 04:10:49 ip-10-196-51-239 rhc-watchman[27526]: Throttler: REFUSED restore => 25989dea0afc11e3bbd012313b083001 (unknown utilization) Aug 22 04:11:09 ip-10-196-51-239 rhc-watchman[27526]: Running rhc-watchman => delay: 20s, exception threshold: 10 Aug 22 04:11:09 ip-10-196-51-239 rhc-watchman[27526]: Throttler: REFUSED restore => 25989dea0afc11e3bbd012313b083001 (unknown utilization) Aug 22 04:11:29 ip-10-196-51-239 rhc-watchman[27526]: Running rhc-watchman => delay: 20s, exception threshold: 10 Aug 22 04:11:29 ip-10-196-51-239 rhc-watchman[27526]: Throttler: REFUSED restore => 25989dea0afc11e3bbd012313b083001 (unknown utilization) Aug 22 04:11:49 ip-10-196-51-239 rhc-watchman[27526]: Running rhc-watchman => delay: 20s, exception threshold: 10 Aug 22 04:11:49 ip-10-196-51-239 rhc-watchman[27526]: Throttler: REFUSED restore => 25989dea0afc11e3bbd012313b083001 (unknown utilization) Aug 22 04:12:09 ip-10-196-51-239 rhc-watchman[27526]: Running rhc-watchman => delay: 20s, exception threshold: 10 Aug 22 04:12:09 ip-10-196-51-239 rhc-watchman[27526]: Throttler: REFUSED restore => 25989dea0afc11e3bbd012313b083001 (unknown utilization) Aug 22 04:12:29 ip-10-196-51-239 rhc-watchman[27526]: Running rhc-watchman => delay: 20s, exception threshold: 10 Aug 22 04:12:29 ip-10-196-51-239 rhc-watchman[27526]: Throttler: REFUSED restore => 25989dea0afc11e3bbd012313b083001 (unknown utilization) Aug 22 04:12:49 ip-10-196-51-239 rhc-watchman[27526]: Running rhc-watchman => delay: 20s, exception threshold: 10 Aug 22 04:12:49 ip-10-196-51-239 rhc-watchman[27526]: Throttler: REFUSED restore => 25989dea0afc11e3bbd012313b083001 (unknown utilization) Aug 22 04:13:09 ip-10-196-51-239 rhc-watchman[27526]: Running rhc-watchman => delay: 20s, exception threshold: 10 Aug 22 04:13:09 ip-10-196-51-239 rhc-watchman[27526]: Throttler: REFUSED restore => 25989dea0afc11e3bbd012313b083001 (unknown utilization) Aug 22 04:13:29 ip-10-196-51-239 rhc-watchman[27526]: Running rhc-watchman => delay: 20s, exception threshold: 10 Aug 22 04:13:29 ip-10-196-51-239 rhc-watchman[27526]: Throttler: REFUSED restore => 25989dea0afc11e3bbd012313b083001 (unknown utilization) Aug 22 04:13:49 ip-10-196-51-239 rhc-watchman[27526]: Running rhc-watchman => delay: 20s, exception threshold: 10 Aug 22 04:13:49 ip-10-196-51-239 rhc-watchman[27526]: Throttler: REFUSED restore => 25989dea0afc11e3bbd012313b083001 (unknown utilization) Aug 22 04:14:09 ip-10-196-51-239 rhc-watchman[27526]: Running rhc-watchman => delay: 20s, exception threshold: 10 Aug 22 04:14:09 ip-10-196-51-239 rhc-watchman[27526]: Throttler: REFUSED restore => 25989dea0afc11e3bbd012313b083001 (unknown utilization) Aug 22 04:14:29 ip-10-196-51-239 rhc-watchman[27526]: Running rhc-watchman => delay: 20s, exception threshold: 10 Aug 22 04:14:29 ip-10-196-51-239 rhc-watchman[27526]: Throttler: REFUSED restore => 25989dea0afc11e3bbd012313b083001 (unknown utilization) Aug 22 04:14:49 ip-10-196-51-239 rhc-watchman[27526]: Running rhc-watchman => delay: 20s, exception threshold: 10 Aug 22 04:14:49 ip-10-196-51-239 rhc-watchman[27526]: Throttler: REFUSED restore => 25989dea0afc11e3bbd012313b083001 (unknown utilization) Aug 22 04:15:09 ip-10-196-51-239 rhc-watchman[27526]: Running rhc-watchman => delay: 20s, exception threshold: 10 Aug 22 04:15:09 ip-10-196-51-239 rhc-watchman[27526]: Throttler: REFUSED restore => 25989dea0afc11e3bbd012313b083001 (unknown utilization) Aug 22 04:15:29 ip-10-196-51-239 rhc-watchman[27526]: Running rhc-watchman => delay: 20s, exception threshold: 10 Aug 22 04:15:29 ip-10-196-51-239 rhc-watchman[27526]: Throttler: REFUSED restore => 25989dea0afc11e3bbd012313b083001 (unknown utilization) Aug 22 04:15:49 ip-10-196-51-239 rhc-watchman[27526]: Running rhc-watchman => delay: 20s, exception threshold: 10 Aug 22 04:15:49 ip-10-196-51-239 rhc-watchman[27526]: Throttler: REFUSED restore => 25989dea0afc11e3bbd012313b083001 (unknown utilization) Aug 22 04:16:09 ip-10-196-51-239 rhc-watchman[27526]: Running rhc-watchman => delay: 20s, exception threshold: 10 Aug 22 04:16:09 ip-10-196-51-239 rhc-watchman[27526]: Throttler: REFUSED restore => 25989dea0afc11e3bbd012313b083001 (unknown utilization) Aug 22 04:16:29 ip-10-196-51-239 rhc-watchman[27526]: Running rhc-watchman => delay: 20s, exception threshold: 10 Aug 22 04:16:29 ip-10-196-51-239 rhc-watchman[27526]: Throttler: REFUSED restore => 25989dea0afc11e3bbd012313b083001 (unknown utilization) Aug 22 04:16:49 ip-10-196-51-239 rhc-watchman[27526]: Running rhc-watchman => delay: 20s, exception threshold: 10 Aug 22 04:16:49 ip-10-196-51-239 rhc-watchman[27526]: Throttler: REFUSED restore => 25989dea0afc11e3bbd012313b083001 (unknown utilization) Aug 22 04:17:09 ip-10-196-51-239 rhc-watchman[27526]: Running rhc-watchman => delay: 20s, exception threshold: 10 Aug 22 04:17:09 ip-10-196-51-239 rhc-watchman[27526]: Throttler: REFUSED restore => 25989dea0afc11e3bbd012313b083001 (unknown utilization) Aug 22 04:17:29 ip-10-196-51-239 rhc-watchman[27526]: Running rhc-watchman => delay: 20s, exception threshold: 10 Aug 22 04:17:29 ip-10-196-51-239 rhc-watchman[27526]: Throttler: REFUSED restore => 25989dea0afc11e3bbd012313b083001 (unknown utilization) Aug 22 04:17:49 ip-10-196-51-239 rhc-watchman[27526]: Running rhc-watchman => delay: 20s, exception threshold: 10 Aug 22 04:17:49 ip-10-196-51-239 rhc-watchman[27526]: Throttler: REFUSED restore => 25989dea0afc11e3bbd012313b083001 (unknown utilization)
It seems like a caching issue to me. Will check
We were only removing values from the running_apps hash, but we needed to also remove them from the previously throttled hash. Updated in this PR: https://github.com/openshift/origin-server/pull/3474
Commit pushed to master at https://github.com/openshift/origin-server https://github.com/openshift/origin-server/commit/b9d0e377ed03c4b686b9c285008923203a2a2c71 Merge pull request #3474 from fotioslindiakos/Bug999837 Merged by openshift-bot
Checked on devenv-stage_452, issue has been fixed. Aug 23 05:22:01 ip-10-118-14-174 rhc-watchman[1927]: Running rhc-watchman => delay: 20s, exception threshold: 10 Aug 23 05:22:01 ip-10-118-14-174 rhc-watchman[1927]: Throttler: throttle => a323e9300bd411e3a65012313d080160 (127.325) Aug 23 05:22:21 ip-10-118-14-174 rhc-watchman[1927]: Running rhc-watchman => delay: 20s, exception threshold: 10 Aug 23 05:22:21 ip-10-118-14-174 rhc-watchman[1927]: Throttler: REFUSED restore => a323e9300bd411e3a65012313d080160 (still over threshold (378.562)) Aug 23 05:22:39 ip-10-118-14-174 CGRE[1016]: Reloading rules configuration Aug 23 05:22:41 ip-10-118-14-174 rhc-watchman[1927]: Running rhc-watchman => delay: 20s, exception threshold: 10 Aug 23 05:23:01 ip-10-118-14-174 rhc-watchman[1927]: Running rhc-watchman => delay: 20s, exception threshold: 10 Aug 23 05:23:21 ip-10-118-14-174 rhc-watchman[1927]: Running rhc-watchman => delay: 20s, exception threshold: 10 Aug 23 05:23:41 ip-10-118-14-174 rhc-watchman[1927]: Running rhc-watchman => delay: 20s, exception threshold: 10 Cgroup will reload configuration rules after app being deleted. Move bug to verified.