+++ This bug was initially created as a clone of Bug #1171289 +++ Description of problem: Watchman's OOMPlugin can hang indefinitely on pkill commands if an OOM gear is holding locks on kernel task objects. Possible solutions: * fork the ruby process and call app.kill_procs() in the child. * Kernel.spawn the pkill command(s) In either case, if we wait for them at all, it should be after the memory limit bump. Maybe we don't even care and should run Process.detach on the PID. Version-Release number of selected component (if applicable): openshift-origin-node-util-1.31.3-1.el6oso.noarch --- Additional comment from Andy Grimm on 2014-12-11 09:13:44 EST --- PR for master is https://github.com/openshift/origin-server/pull/6010 It needs another [merge], as the first attempt failed tests. The corresponding PR for stage has been merged, and shoudl be tagged into a hotfix today.
Verified and pass on puddle-2-2-2014-12-11 1) checked the code, the safe_pkill was added in this puddle. 2) create app and Increase the memory usage in the gear ( perl -np -e \'$x="0123456789"x1000000\' < /dev/zero) 3) wait for minutes. check the /var/log/message, we can found the gear was killed. Dec 12 03:43:33 ose2 watchman[29375]: OOM Plugin: Found gear 548ac1186bb25e95a50000de under OOM. Dec 12 03:43:33 ose2 watchman[29375]: OOM Plugin: Increasing memory for gear 548ac1186bb25e95a50000de to 705901363 and killing processe
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2015-0019.html