Bug 837066

Summary: watchman inefficient
Product: OKD Reporter: Mike McGrath <mmcgrath>
Component: ContainersAssignee: Jhon Honce <jhonce>
Status: CLOSED CURRENTRELEASE QA Contact: libra bugs <libra-bugs>
Severity: high Docs Contact:
Priority: high    
Version: 2.xCC: jialiu, mpatel, rmillner
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-08-07 20:42:40 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Mike McGrath 2012-07-02 16:16:17 UTC
On systems with lots of gears (think in the 2,000-3,000 range) watchman takes up a lot of cpu power trying to crunch through them all.  We should maybe put a sleep in it to make things run slower, or maybe keep a list of the idle gears and only check their health every hour or so.

Comment 1 Jhon Honce 2012-07-02 23:31:28 UTC
 * Minimal processing of idled or stopped applications
 * Delay between loops default changed to 20 secs
 * if > 50% of applications idled on node delay has 3x multiplier 

Pull Request li#20

Comment 2 Johnny Liu 2012-07-12 12:08:11 UTC
Verified this bug on devenv-stage_223, PASS.


Get the following message in syslog:

Jul 12 05:30:40 ip-10-195-169-69 rhc-watchman[17135]: Starting rhc-watchman => delay: 20s, exception threshold: 10
<--snip-->
Jul 12 06:08:42 ip-10-195-169-69 rhc-watchman[17135]: Running rhc-watchman => delay: 20s, exception threshold: 10
<--snip-->
Jul 12 06:10:23 ip-10-195-169-69 rhc-watchman[17135]: Running rhc-watchman => delay: 60s, exception threshold: 10
<--snip-->
Jul 12 06:12:23 ip-10-195-169-69 rhc-watchman[17135]: Running rhc-watchman => delay: 180s, exception threshold: 10
<--snip-->
Jul 12 06:15:23 ip-10-195-169-69 rhc-watchman[17135]: Running rhc-watchman => delay: 540s, exception threshold: 10
<--snip-->

Comment 3 Jhon Honce 2012-07-12 14:42:20 UTC
Delay should not keep growing.

Comment 4 Jhon Honce 2012-07-12 14:53:37 UTC
Waiting on https://github.com/openshift/li/pull/61

Comment 5 Jhon Honce 2012-07-12 15:25:54 UTC
(In reply to comment #4)
> Waiting on https://github.com/openshift/li/pull/61

Logging comment was not updated
https://github.com/openshift/li/pull/62

Comment 6 Johnny Liu 2012-07-16 12:27:59 UTC
Verified this bug with devenv_1894, and PASS.


Get the following message in syslog:
<--snip-->
Jul 16 08:24:39 ip-10-194-26-207 rhc-watchman[1720]: Running rhc-watchman => delay: 20s, exception threshold: 10
<--snip-->
Jul 16 08:24:59 ip-10-194-26-207 rhc-watchman[1720]: Running rhc-watchman => delay: 20s, exception threshold: 10
<--snip-->
Jul 16 08:26:19 ip-10-194-26-207 rhc-watchman[1720]: Running rhc-watchman => delay: 60s, exception threshold: 10
<--snip-->
Jul 16 08:27:19 ip-10-194-26-207 rhc-watchman[1720]: Running rhc-watchman => delay: 60s, exception threshold: 10
<--snip-->