Bug 1028205

Summary: 'oo-admin-ctl-gears forcestopgear' fails to stop locked gears
Product: OpenShift Online Reporter: Stefanie Forrester <dakini>
Component: ContainersAssignee: Dan McPherson <dmcphers>
Status: CLOSED CURRENTRELEASE QA Contact: libra bugs <libra-bugs>
Severity: medium Docs Contact:
Priority: medium    
Version: 1.xCC: bmeng, dmcphers, mpatel
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-01-30 00:49:54 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Stefanie Forrester 2013-11-07 21:58:21 UTC
Description of problem:

Sometimes there are processes left running after a gear is "stopped". (Specifically, we're seeing this with java processes).

A process is running under the gear's uuid, but the gear is in a "locked" state, as if it tried to terminate. Attempting to terminate the process using 'oo-admin-ctl-gears forcestopgear <uuid>' fails. 


Version-Release number of selected component (if applicable): 
openshift-origin-node-util-1.16.3-1.el6oso.noarch


How reproducible:

Very reproducible, if you can find one of these locked gears (which appear somewhat frequently in production). 'forcestopgear' fails to stop the gear every time so far.

Steps to Reproduce:
1. Look at 'top' to identify processes consuming the most swap.
2. Try to restart the gear associated with that process... restart fails, and indicates gear is locked.

[sedgar ~]$ sudo oo-admin-ctl-gears restartgear  2364857a1c3e443fbb31bc00107e0d5d
Gear is locked: 2364857a1c3e443fbb31bc00107e0d5d

3. Force-stopping the gear also fails. It can only be killed with a SIGKILL.

[sedgar ~]$ sudo oo-admin-ctl-gears stopgear 2364857a1c3e443fbb31bc00107e0d5d
Gear is locked: 2364857a1c3e443fbb31bc00107e0d5d

[sedgar ~]$ sudo oo-admin-ctl-gears forcestopgear 2364857a1c3e443fbb31bc00107e0d5d
Gear is locked: 2364857a1c3e443fbb31bc00107e0d5d

Actual results:
'oo-admin-ctl-gears forcestopgear' is unable to stop the gear.

Expected results:
'oo-admin-ctl-gears forcestopgear' should be successful in terminating the remaining gear processes. 

Additional info:

Comment 1 Dan McPherson 2013-11-09 00:13:08 UTC
https://github.com/openshift/origin-server/pull/4141

Comment 2 Dan McPherson 2013-11-09 00:16:30 UTC
I changed both stop and force stop to ignore the stop lock.  It should only be used to stop an admin from starting/restarting.

Comment 4 Meng Bo 2013-11-11 05:47:16 UTC
Checked on devenv_4016, issue has been fixed.

restartgear/stopgear/forcestopgear will ignore the stop_lock.


# oo-admin-ctl-gears startgear 52806dc76f25e958b0000004
Gear is locked: 52806dc76f25e958b0000004
# oo-admin-ctl-gears restartgear 52806dc76f25e958b0000004
Restarting gear 52806dc76f25e958b0000004... [ OK ]

# oo-admin-ctl-gears startgear 52806dc76f25e958b0000004
Gear is locked: 52806dc76f25e958b0000004
# oo-admin-ctl-gears stopgear 52806dc76f25e958b0000004
Stopping gear 52806dc76f25e958b0000004... [ OK ]

# oo-admin-ctl-gears startgear 52806dc76f25e958b0000004
Gear is locked: 52806dc76f25e958b0000004
# oo-admin-ctl-gears forcestopgear 52806dc76f25e958b0000004
Stopping gear 52806dc76f25e958b0000004... [ OK ]

Move bug to verified.