Bug 1160494 - Unhandled ShellExecutionException prevents oo-admin-ctl-gears forcestopgear from pkill'ing gear processes
Summary: Unhandled ShellExecutionException prevents oo-admin-ctl-gears forcestopgear f...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: OpenShift Online
Classification: Red Hat
Component: Containers
Version: 1.x
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 2.x
Assignee: Jhon Honce
QA Contact: libra bugs
URL:
Whiteboard:
Depends On:
Blocks: 1162192
TreeView+ depends on / blocked
 
Reported: 2014-11-05 01:36 UTC by Andy Grimm
Modified: 2016-11-08 03:47 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1162192 (view as bug list)
Environment:
Last Closed: 2015-02-18 16:52:35 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Andy Grimm 2014-11-05 01:36:16 UTC
Description of problem:

One of the primary cases for running "forcestop" on a gear is to deal with situations where "runuser" fails or where the unprivileged user cannot kill processes for some reason.  The fallback is to pkill processes from _outside_ the gear.  However, when the current version of the code fails to stop a cartridge, it raises a ShellExecutionException before the "pkill" is run.

The result is that if a gear hits a ulimit (specifically nproc), an administrator must log in and clean up.

Version-Release number of selected component (if applicable):

rubygem-openshift-origin-node-1.31.9-1.el6oso.noarch

How reproducible:

Always

Steps to Reproduce:
1. Run a gear with 250 process threads
2. Attempt to stop, restart, or forcestop the gear

Actual results:

All attempts will fail.

Expected results:

forcestop should kill all gear processes, and allow a subsequent start/restart to work successfully.

Comment 1 Jhon Honce 2014-11-05 01:47:33 UTC
oo-admin-ctl-gears uses ApplicationContainer#stop_gear() not ApplicationContainer#force_stop() for better control of gear state. This method does not handle the ShellExecutionException properly.

Comment 2 Jhon Honce 2014-11-07 22:16:09 UTC
Fixed in https://github.com/openshift/origin-server/pull/5937

Comment 3 openshift-github-bot 2014-11-07 23:08:47 UTC
Commit pushed to master at https://github.com/openshift/origin-server

https://github.com/openshift/origin-server/commit/26e43acb8358e1d415abe185b968910cfd54c651
Bug 1160494 - Protect Ops stop_gear from cartridge errors

Comment 4 Ruikai Liu 2014-11-10 02:59:43 UTC
force-stop now works on devenv_5288

Comment 5 Ruikai Liu 2014-11-10 03:25:55 UTC
Verified as follows:

0. Check the max processes limit:
[app0-ruliu0.dev.rhcloud.com 54606c4629133999e5000012]\> ulimit -u
250

1. Run as many as processes as possible:
[app0-ruliu0.dev.rhcloud.com 54606c4629133999e5000012]\> cat /tmp/1.sh
#!/bin/bash

for i in `seq 0 249`; do
    sleep 3600 &
done
[app0-ruliu0.dev.rhcloud.com 54606c4629133999e5000012]\> /tmp/1.sh

2. Force stop the gear:
[root@ip-10-231-32-4 ~]# oo-admin-ctl-gears forcestopgear 54606c4629133999e5000012
Then the gear is stopped and all sleep processes killed.

3. Start and restart the gear:
[root@ip-10-231-32-4 ~]# oo-admin-ctl-gears startgear 54606c4629133999e5000012
[root@ip-10-231-32-4 ~]# oo-admin-ctl-gears restartgear 54606c4629133999e5000012
Succeed and the gear is now started.


Note You need to log in before you can comment on or make changes to this bug.