Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1127714

Summary: watchman takes up gigs of memory, times out on restart
Product: OpenShift Container Platform Reporter: Brenton Leanhardt <bleanhar>
Component: ContainersAssignee: Brenton Leanhardt <bleanhar>
Status: CLOSED ERRATA QA Contact: libra bugs <libra-bugs>
Severity: high Docs Contact:
Priority: high    
Version: 2.1.0CC: adellape, anli, bmeng, jhonce, jokerman, libra-bugs, libra-onpremise-devel, lmeyer, mmccomas, pruan, sten
Target Milestone: ---Keywords: Upstream
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: rubygem-openshift-origin-node-1.23.9.16-1.el6op, openshift-origin-node-util-1.22.15.1-1.el6op Doc Type: Bug Fix
Doc Text:
Due to a bug in the Watchman code, the Watchman Throttler plug-in had a memory leak which caused the Watchman service to consume too much memory and time out when attempting a restart. This bug fix updates the plug-in to resolve the memory leak, and these issues no longer occur as a result. After applying this fix, the openshift-watchman service must be restarted.
Story Points: ---
Clone Of: 1121217 Environment:
Last Closed: 2014-09-11 20:06:41 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1121217    
Bug Blocks:    

Comment 4 Brenton Leanhardt 2014-08-22 20:19:17 UTC
Upstream commits:

commit c4cb6ddaa13173e7c61853be98d13d20b77457c2
Author: Rajat Chopra <rchopra>
Date:   Tue Jul 29 12:15:31 2014 -0700

    add debug messages to watchman that print memory usage (bz1123935)

commit a0149a176f417aee7cc82190b90859158a38c09d
Author: Jhon Honce <jhonce>
Date:   Wed Aug 6 11:29:12 2014 -0700

    Bug 1121217 - Symbol leak in Throttler cgroup code
    
    * Enhance debugging output
    * Remove to_sym in keys

commit e00d653b764334fb5da6c2b301b5dd52629c9234
Author: Jhon Honce <jhonce>
Date:   Wed Aug 6 14:52:00 2014 -0700

    Bug 1121217 - Symbol leak in Throttler cgroup code
    
    * fix tests

Comment 6 Anping Li 2014-08-25 04:20:11 UTC
Verfied on verified on puddle 2014-08-22.1

On puddle-2014-08-15.1,the bug can be reproduced.

[root@node2 ~]# service openshift-watchman restart
Stopping Watchman........................................................Watchman operation timed out

tailf /var/log/message|grep watchman
Aug 24 23:13:20 node2 watchman[20521]: Memory : 38540, Plugin : GearStatePlugin
Aug 24 23:13:21 node2 watchman[20521]: Memory : 38528, Plugin : ThrottlerPlugin
Aug 24 23:13:21 node2 watchman[20521]: Memory : 38528, Plugin : MetricsPlugin
Aug 24 23:13:41 node2 watchman[20521]: Memory : 38532, Plugin : JbossPlugin
Aug 24 23:13:41 node2 watchman[20521]: Memory : 38532, Plugin : OomPlugin
Aug 24 23:13:41 node2 watchman[20521]: Memory : 38532, Plugin : EnvPlugin

On puddle 2014-08-22.1, no this issue.
[root@node2 ~]# service openshift-watchman restart
Stopping Watchman
Starting Watchman

tailf /var/log/message|grep watchman
Aug 25 00:12:54 node2 watchman[3343]: Starting Watchman => iteration delay: 20s
Aug 25 00:12:54 node2 watchman[3343]: Watchman debug is set to true
Aug 25 00:12:56 node2 watchman[3343]: Gears: 44, Memory: 36160, Plugin: all, Symbols: 10610, Objects: 182267
Aug 25 00:13:16 node2 watchman[3343]: Gears: 44, Memory: 40316, Plugin: all, Symbols: 10636, Objects: 182267
Aug 25 00:13:37 node2 watchman[3343]: Gears: 44, Memory: 41248, Plugin: all, Symbols: 10636, Objects: 182267
Aug 25 00:13:57 node2 watchman[3343]: Gears: 44, Memory: 41804, Plugin: all, Symbols: 10636, Objects: 182267
Aug 25 00:14:17 node2 watchman[3343]: Gears: 44, Memory: 42820, Plugin: all, Symbols: 10636, Objects: 182267
Aug 25 00:14:37 node2 watchman[3343]: Gears: 44, Memory: 42944, Plugin: all, Symbols: 10636, Objects: 182267

Comment 8 errata-xmlrpc 2014-09-11 20:06:41 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2014-1183.html