Description of problem: One of the primary cases for running "forcestop" on a gear is to deal with situations where "runuser" fails or where the unprivileged user cannot kill processes for some reason. The fallback is to pkill processes from _outside_ the gear. However, when the current version of the code fails to stop a cartridge, it raises a ShellExecutionException before the "pkill" is run. The result is that if a gear hits a ulimit (specifically nproc), an administrator must log in and clean up. Version-Release number of selected component (if applicable): rubygem-openshift-origin-node-1.31.9-1.el6oso.noarch How reproducible: Always Steps to Reproduce: 1. Run a gear with 250 process threads 2. Attempt to stop, restart, or forcestop the gear Actual results: All attempts will fail. Expected results: forcestop should kill all gear processes, and allow a subsequent start/restart to work successfully.
oo-admin-ctl-gears uses ApplicationContainer#stop_gear() not ApplicationContainer#force_stop() for better control of gear state. This method does not handle the ShellExecutionException properly.
Fixed in https://github.com/openshift/origin-server/pull/5937
Commit pushed to master at https://github.com/openshift/origin-server https://github.com/openshift/origin-server/commit/26e43acb8358e1d415abe185b968910cfd54c651 Bug 1160494 - Protect Ops stop_gear from cartridge errors
force-stop now works on devenv_5288
Verified as follows: 0. Check the max processes limit: [app0-ruliu0.dev.rhcloud.com 54606c4629133999e5000012]\> ulimit -u 250 1. Run as many as processes as possible: [app0-ruliu0.dev.rhcloud.com 54606c4629133999e5000012]\> cat /tmp/1.sh #!/bin/bash for i in `seq 0 249`; do sleep 3600 & done [app0-ruliu0.dev.rhcloud.com 54606c4629133999e5000012]\> /tmp/1.sh 2. Force stop the gear: [root@ip-10-231-32-4 ~]# oo-admin-ctl-gears forcestopgear 54606c4629133999e5000012 Then the gear is stopped and all sleep processes killed. 3. Start and restart the gear: [root@ip-10-231-32-4 ~]# oo-admin-ctl-gears startgear 54606c4629133999e5000012 [root@ip-10-231-32-4 ~]# oo-admin-ctl-gears restartgear 54606c4629133999e5000012 Succeed and the gear is now started.