Bug 1173246 - watchman OOMPlugin should background pkill commands
Summary: watchman OOMPlugin should background pkill commands
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Containers
Version: 2.2.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: ---
Assignee: Brenton Leanhardt
QA Contact: libra bugs
URL:
Whiteboard:
Depends On: 1171289
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-12-11 18:18 UTC by Brenton Leanhardt
Modified: 2019-03-22 07:28 UTC (History)
9 users (show)

Fixed In Version: openshift-origin-node-util-1.32.4.1-1
Doc Type: Bug Fix
Doc Text:
Cause: Previously, watchman OOMPlugin waited for pkill to exit. Consequence: Watchman would unnecessarily wait for pkill to exit which may take a long time and block other tasks. Fix: Watchman now backgrounds pkill tasks. Result: Watchman will now continue processing other tasks while pkill operations are processed in the background.
Clone Of: 1171289
Environment:
Last Closed: 2015-01-08 15:34:55 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2015:0019 0 normal SHIPPED_LIVE Red Hat OpenShift Enterprise 2.2.3 bug fix and enhancement update 2015-01-08 20:33:24 UTC

Description Brenton Leanhardt 2014-12-11 18:18:06 UTC
+++ This bug was initially created as a clone of Bug #1171289 +++

Description of problem:
Watchman's OOMPlugin can hang indefinitely on pkill commands if an OOM gear is holding locks on kernel task objects.

Possible solutions:

* fork the ruby process and call app.kill_procs() in the child.
* Kernel.spawn the pkill command(s)

In either case, if we wait for them at all, it should be after the memory limit bump.  Maybe we don't even care and should run Process.detach on the PID.

Version-Release number of selected component (if applicable):
openshift-origin-node-util-1.31.3-1.el6oso.noarch

--- Additional comment from Andy Grimm on 2014-12-11 09:13:44 EST ---

PR for master is https://github.com/openshift/origin-server/pull/6010

It needs another [merge], as the first attempt failed tests.

The corresponding PR for stage has been merged, and shoudl be tagged into a hotfix today.

Comment 3 Anping Li 2014-12-12 10:47:13 UTC
Verified and pass on puddle-2-2-2014-12-11
1) checked the code, the safe_pkill was added in this puddle.
2) create app and Increase the memory usage in the gear ( perl -np -e \'$x="0123456789"x1000000\' < /dev/zero)
3) wait for minutes. check the /var/log/message, we can found the gear was killed.

Dec 12 03:43:33 ose2 watchman[29375]: OOM Plugin: Found gear 548ac1186bb25e95a50000de under OOM.
Dec 12 03:43:33 ose2 watchman[29375]: OOM Plugin: Increasing memory for gear 548ac1186bb25e95a50000de to 705901363 and killing processe

Comment 5 errata-xmlrpc 2015-01-08 15:34:55 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-0019.html


Note You need to log in before you can comment on or make changes to this bug.