Bug 1121217 - watchman takes up gigs of memory, times out on restart
Summary: watchman takes up gigs of memory, times out on restart
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: OpenShift Online
Classification: Red Hat
Component: Containers
Version: 2.x
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 2.x
Assignee: Jhon Honce
QA Contact: libra bugs
URL:
Whiteboard:
: 1096270 (view as bug list)
Depends On:
Blocks: 1127714
TreeView+ depends on / blocked
 
Reported: 2014-07-18 16:29 UTC by Sten Turpin
Modified: 2015-05-14 23:37 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1127714 (view as bug list)
Environment:
Last Closed: 2014-10-10 00:49:11 UTC


Attachments (Terms of Use)

Description Sten Turpin 2014-07-18 16:29:58 UTC
Description of problem: watchman takes up lots of memory and times out when attempting a restart


Version-Release number of selected component (if applicable): openshift-origin-node-util-1.26.3-1.el6oso.noarch


How reproducible: rarely


Steps to Reproduce:
1. $ ps aux | grep -i watchman
root      10096  4.3 12.3 2332684 930992 ?      Sl   Jun23 1599:38 watchman

2. $ sudo service openshift-watchman restart
Stopping Watchman.................................................Watchman operation timed out


Actual results:


Expected results:
Watchman should not use so much memory, or fail to restart

Additional info:

Comment 2 Rajat Chopra 2014-07-29 19:27:08 UTC
Put in some debug messages to print memory information after each watchman plugin is invoked. The messages go in /var/log/messages and the debug mode can be enabled by setting an env var 'WATCHMAN_DEBUG' to true.

Hopefully we can narrow it down which plugin causes the leak.

https://github.com/openshift/origin-server/pull/5670

Comment 3 Meng Bo 2014-08-04 06:37:16 UTC
Checked on devenv-stage_946, the debug option was added to watchman config.

# cat /etc/sysconfig/watchman
WATCHMAN_DEBUG=true

# tail -f /var/log/messages
Aug 12 00:14:58 ip-10-99-163-60 watchman[21483]: Watchman debug is set to true
Aug 12 00:14:58 ip-10-99-163-60 watchman[21483]: Memory : 36484, Plugin : JbossPlugin
Aug 12 00:14:58 ip-10-99-163-60 watchman[21483]: Memory : 36560, Plugin : OomPlugin
Aug 12 00:14:58 ip-10-99-163-60 watchman[21483]: Memory : 36608, Plugin : EnvPlugin
Aug 12 00:14:58 ip-10-99-163-60 watchman[21483]: Memory : 36608, Plugin : ThrottlerPlugin
Aug 12 00:14:58 ip-10-99-163-60 watchman[21483]: Memory : 36688, Plugin : GearStatePlugin
Aug 12 00:14:58 ip-10-99-163-60 watchman[21483]: Memory : 36688, Plugin : MetricsPlugin

Comment 4 Jhon Honce 2014-08-06 23:30:44 UTC
Fixed in https://github.com/openshift/origin-server/pull/5695

Comment 5 openshift-github-bot 2014-08-07 01:46:48 UTC
Commits pushed to master at https://github.com/openshift/origin-server

https://github.com/openshift/origin-server/commit/a0149a176f417aee7cc82190b90859158a38c09d
Bug 1121217 - Symbol leak in Throttler cgroup code

* Enhance debugging output
* Remove to_sym in keys

https://github.com/openshift/origin-server/commit/e00d653b764334fb5da6c2b301b5dd52629c9234
Bug 1121217 - Symbol leak in Throttler cgroup code

* fix tests

Comment 6 Andy Grimm 2014-08-07 13:04:44 UTC
*** Bug 1096270 has been marked as a duplicate of this bug. ***

Comment 7 Meng Bo 2014-08-08 10:21:32 UTC
Checked on devenv-stage_952, with about 80 gears running on a m3.medium node.

With following config in sysconfig:

# cat /etc/sysconfig/watchman 
GEAR_RETRIES=3
RETRY_DELAY=30
RETRY_PERIOD=60
STATE_CHANGE_DELAY=10
STATE_CHECK_PERIOD=1
THROTTLER_CHECK_PERIOD=1
OOM_CHECK_PERIOD=1
WATCHMAN_DEBUG=true

Wathcman running with about 50% cpu usage and memory usage will not greater than 10%. And watchman can be restarted.

Also do regression testing for throttle plugin, gear_state_plugin and oom_plugin. All of them working well.

Move bug to verified.


Note You need to log in before you can comment on or make changes to this bug.