837066 – watchman inefficient

Bug 837066 - watchman inefficient

Summary: watchman inefficient

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	OKD
Classification:	Red Hat
Component:	Containers
Sub Component:
Version:	2.x
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Jhon Honce
QA Contact:	libra bugs
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2012-07-02 16:16 UTC by Mike McGrath
Modified:	2015-05-14 22:56 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2012-08-07 20:42:40 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Mike McGrath 2012-07-02 16:16:17 UTC

On systems with lots of gears (think in the 2,000-3,000 range) watchman takes up a lot of cpu power trying to crunch through them all.  We should maybe put a sleep in it to make things run slower, or maybe keep a list of the idle gears and only check their health every hour or so.

Comment 1 Jhon Honce 2012-07-02 23:31:28 UTC

 * Minimal processing of idled or stopped applications
 * Delay between loops default changed to 20 secs
 * if > 50% of applications idled on node delay has 3x multiplier 

Pull Request li#20

Comment 2 Johnny Liu 2012-07-12 12:08:11 UTC

Verified this bug on devenv-stage_223, PASS.


Get the following message in syslog:

Jul 12 05:30:40 ip-10-195-169-69 rhc-watchman[17135]: Starting rhc-watchman => delay: 20s, exception threshold: 10
<--snip-->
Jul 12 06:08:42 ip-10-195-169-69 rhc-watchman[17135]: Running rhc-watchman => delay: 20s, exception threshold: 10
<--snip-->
Jul 12 06:10:23 ip-10-195-169-69 rhc-watchman[17135]: Running rhc-watchman => delay: 60s, exception threshold: 10
<--snip-->
Jul 12 06:12:23 ip-10-195-169-69 rhc-watchman[17135]: Running rhc-watchman => delay: 180s, exception threshold: 10
<--snip-->
Jul 12 06:15:23 ip-10-195-169-69 rhc-watchman[17135]: Running rhc-watchman => delay: 540s, exception threshold: 10
<--snip-->

Comment 3 Jhon Honce 2012-07-12 14:42:20 UTC

Delay should not keep growing.

Comment 4 Jhon Honce 2012-07-12 14:53:37 UTC

Waiting on https://github.com/openshift/li/pull/61

Comment 5 Jhon Honce 2012-07-12 15:25:54 UTC

(In reply to comment #4)
> Waiting on https://github.com/openshift/li/pull/61

Logging comment was not updated
https://github.com/openshift/li/pull/62

Comment 6 Johnny Liu 2012-07-16 12:27:59 UTC

Verified this bug with devenv_1894, and PASS.


Get the following message in syslog:
<--snip-->
Jul 16 08:24:39 ip-10-194-26-207 rhc-watchman[1720]: Running rhc-watchman => delay: 20s, exception threshold: 10
<--snip-->
Jul 16 08:24:59 ip-10-194-26-207 rhc-watchman[1720]: Running rhc-watchman => delay: 20s, exception threshold: 10
<--snip-->
Jul 16 08:26:19 ip-10-194-26-207 rhc-watchman[1720]: Running rhc-watchman => delay: 60s, exception threshold: 10
<--snip-->
Jul 16 08:27:19 ip-10-194-26-207 rhc-watchman[1720]: Running rhc-watchman => delay: 60s, exception threshold: 10
<--snip-->

Note You need to log in before you can comment on or make changes to this bug.