Description of problem: We often see watchman logs like this: Jan 24 08:04:13 ex-std-node43 rhc-watchman[186754]: Throttler: REFUSED restore => <UUID> (still over threshold (NaN)) The problem is that when a gear has no CPU utilization at all, nr_periods in cpu.stat does not change. This leads to a division by zero in MonitoredGear.elapsed_usage Version-Release number of selected component (if applicable): rhc-node-1.18.4-1.el6oso.x86_64 (but the oo-watchman rewrite also appears to have this problem) How reproducible: always Steps to Reproduce: 1. do a load test against a gear to cause it to be throttled 2. kill the processes in the gear 3. observe messages like the one above in the log Actual results: "still over threshold (NaN)" indicates a division by zero Expected results: The gear should be unthrottled Additional info: This is a minor issue, because usually when this appears in the logs, the gear actually has no processes running. However, this could be very confusing to a sytems administrator, so it should be fixed.
Commit pushed to master at https://github.com/openshift/origin-server https://github.com/openshift/origin-server/commit/8a3056e19daa1678617648ff4f217e34a1598023 Bug 1057734 - Protect against divide by zero
Checked on devenv_4357, after kill the process which eating the cpu usage. The watchman will unthrottle the gear in a while. Feb 11 00:43:45 ip-10-16-155-161 watchman[1969]: Throttler: throttle => 52f99837c7ca5d728c00003e (158.478) Feb 11 00:44:05 ip-10-16-155-161 watchman[1969]: Throttler: REFUSED restore => 52f99837c7ca5d728c00003e (still over threshold (392.087)) Feb 11 00:44:25 ip-10-16-155-161 watchman[1969]: Throttler: REFUSED restore => 52f99837c7ca5d728c00003e (still over threshold (392.888)) Feb 11 00:44:45 ip-10-16-155-161 watchman[1969]: Throttler: REFUSED restore => 52f99837c7ca5d728c00003e (still over threshold (393.526)) Feb 11 00:45:05 ip-10-16-155-161 watchman[1969]: Throttler: REFUSED restore => 52f99837c7ca5d728c00003e (still over threshold (388.051)) Feb 11 00:45:25 ip-10-16-155-161 watchman[1969]: Throttler: REFUSED restore => 52f99837c7ca5d728c00003e (still over threshold (368.415)) Feb 11 00:45:45 ip-10-16-155-161 watchman[1969]: Throttler: REFUSED restore => 52f99837c7ca5d728c00003e (still over threshold (118.894)) Feb 11 00:46:05 ip-10-16-155-161 watchman[1969]: Throttler: restore => 52f99837c7ca5d728c00003e (9.476) Move bug to verified.