Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1173246 - watchman OOMPlugin should background pkill commands
watchman OOMPlugin should background pkill commands
Status: CLOSED ERRATA
Product: OpenShift Container Platform
Classification: Red Hat
Component: Containers (Show other bugs)
2.2.0
Unspecified Unspecified
medium Severity medium
: ---
: ---
Assigned To: Brenton Leanhardt
libra bugs
:
Depends On: 1171289
Blocks:
  Show dependency treegraph
 
Reported: 2014-12-11 13:18 EST by Brenton Leanhardt
Modified: 2015-07-07 20:57 EDT (History)
9 users (show)

See Also:
Fixed In Version: openshift-origin-node-util-1.32.4.1-1
Doc Type: Bug Fix
Doc Text:
Cause: Previously, watchman OOMPlugin waited for pkill to exit. Consequence: Watchman would unnecessarily wait for pkill to exit which may take a long time and block other tasks. Fix: Watchman now backgrounds pkill tasks. Result: Watchman will now continue processing other tasks while pkill operations are processed in the background.
Story Points: ---
Clone Of: 1171289
Environment:
Last Closed: 2015-01-08 10:34:55 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2015:0019 normal SHIPPED_LIVE Red Hat OpenShift Enterprise 2.2.3 bug fix and enhancement update 2015-01-08 15:33:24 EST

  None (edit)
Description Brenton Leanhardt 2014-12-11 13:18:06 EST
+++ This bug was initially created as a clone of Bug #1171289 +++

Description of problem:
Watchman's OOMPlugin can hang indefinitely on pkill commands if an OOM gear is holding locks on kernel task objects.

Possible solutions:

* fork the ruby process and call app.kill_procs() in the child.
* Kernel.spawn the pkill command(s)

In either case, if we wait for them at all, it should be after the memory limit bump.  Maybe we don't even care and should run Process.detach on the PID.

Version-Release number of selected component (if applicable):
openshift-origin-node-util-1.31.3-1.el6oso.noarch

--- Additional comment from Andy Grimm on 2014-12-11 09:13:44 EST ---

PR for master is https://github.com/openshift/origin-server/pull/6010

It needs another [merge], as the first attempt failed tests.

The corresponding PR for stage has been merged, and shoudl be tagged into a hotfix today.
Comment 3 Anping Li 2014-12-12 05:47:13 EST
Verified and pass on puddle-2-2-2014-12-11
1) checked the code, the safe_pkill was added in this puddle.
2) create app and Increase the memory usage in the gear ( perl -np -e \'$x="0123456789"x1000000\' < /dev/zero)
3) wait for minutes. check the /var/log/message, we can found the gear was killed.

Dec 12 03:43:33 ose2 watchman[29375]: OOM Plugin: Found gear 548ac1186bb25e95a50000de under OOM.
Dec 12 03:43:33 ose2 watchman[29375]: OOM Plugin: Increasing memory for gear 548ac1186bb25e95a50000de to 705901363 and killing processe
Comment 5 errata-xmlrpc 2015-01-08 10:34:55 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-0019.html

Note You need to log in before you can comment on or make changes to this bug.