On systems with lots of gears (think in the 2,000-3,000 range) watchman takes up a lot of cpu power trying to crunch through them all. We should maybe put a sleep in it to make things run slower, or maybe keep a list of the idle gears and only check their health every hour or so.
* Minimal processing of idled or stopped applications * Delay between loops default changed to 20 secs * if > 50% of applications idled on node delay has 3x multiplier Pull Request li#20
Verified this bug on devenv-stage_223, PASS. Get the following message in syslog: Jul 12 05:30:40 ip-10-195-169-69 rhc-watchman[17135]: Starting rhc-watchman => delay: 20s, exception threshold: 10 <--snip--> Jul 12 06:08:42 ip-10-195-169-69 rhc-watchman[17135]: Running rhc-watchman => delay: 20s, exception threshold: 10 <--snip--> Jul 12 06:10:23 ip-10-195-169-69 rhc-watchman[17135]: Running rhc-watchman => delay: 60s, exception threshold: 10 <--snip--> Jul 12 06:12:23 ip-10-195-169-69 rhc-watchman[17135]: Running rhc-watchman => delay: 180s, exception threshold: 10 <--snip--> Jul 12 06:15:23 ip-10-195-169-69 rhc-watchman[17135]: Running rhc-watchman => delay: 540s, exception threshold: 10 <--snip-->
Delay should not keep growing.
Waiting on https://github.com/openshift/li/pull/61
(In reply to comment #4) > Waiting on https://github.com/openshift/li/pull/61 Logging comment was not updated https://github.com/openshift/li/pull/62
Verified this bug with devenv_1894, and PASS. Get the following message in syslog: <--snip--> Jul 16 08:24:39 ip-10-194-26-207 rhc-watchman[1720]: Running rhc-watchman => delay: 20s, exception threshold: 10 <--snip--> Jul 16 08:24:59 ip-10-194-26-207 rhc-watchman[1720]: Running rhc-watchman => delay: 20s, exception threshold: 10 <--snip--> Jul 16 08:26:19 ip-10-194-26-207 rhc-watchman[1720]: Running rhc-watchman => delay: 60s, exception threshold: 10 <--snip--> Jul 16 08:27:19 ip-10-194-26-207 rhc-watchman[1720]: Running rhc-watchman => delay: 60s, exception threshold: 10 <--snip-->