Description of problem: Sometimes there are processes left running after a gear is "stopped". (Specifically, we're seeing this with java processes). A process is running under the gear's uuid, but the gear is in a "locked" state, as if it tried to terminate. Attempting to terminate the process using 'oo-admin-ctl-gears forcestopgear <uuid>' fails. Version-Release number of selected component (if applicable): openshift-origin-node-util-1.16.3-1.el6oso.noarch How reproducible: Very reproducible, if you can find one of these locked gears (which appear somewhat frequently in production). 'forcestopgear' fails to stop the gear every time so far. Steps to Reproduce: 1. Look at 'top' to identify processes consuming the most swap. 2. Try to restart the gear associated with that process... restart fails, and indicates gear is locked. [sedgar ~]$ sudo oo-admin-ctl-gears restartgear 2364857a1c3e443fbb31bc00107e0d5d Gear is locked: 2364857a1c3e443fbb31bc00107e0d5d 3. Force-stopping the gear also fails. It can only be killed with a SIGKILL. [sedgar ~]$ sudo oo-admin-ctl-gears stopgear 2364857a1c3e443fbb31bc00107e0d5d Gear is locked: 2364857a1c3e443fbb31bc00107e0d5d [sedgar ~]$ sudo oo-admin-ctl-gears forcestopgear 2364857a1c3e443fbb31bc00107e0d5d Gear is locked: 2364857a1c3e443fbb31bc00107e0d5d Actual results: 'oo-admin-ctl-gears forcestopgear' is unable to stop the gear. Expected results: 'oo-admin-ctl-gears forcestopgear' should be successful in terminating the remaining gear processes. Additional info:
https://github.com/openshift/origin-server/pull/4141
I changed both stop and force stop to ignore the stop lock. It should only be used to stop an admin from starting/restarting.
Commit pushed to master at https://github.com/openshift/origin-server https://github.com/openshift/origin-server/commit/452f66f4b19d9bd85d8c157c38233e09684c118d Bug 1028205
Checked on devenv_4016, issue has been fixed. restartgear/stopgear/forcestopgear will ignore the stop_lock. # oo-admin-ctl-gears startgear 52806dc76f25e958b0000004 Gear is locked: 52806dc76f25e958b0000004 # oo-admin-ctl-gears restartgear 52806dc76f25e958b0000004 Restarting gear 52806dc76f25e958b0000004... [ OK ] # oo-admin-ctl-gears startgear 52806dc76f25e958b0000004 Gear is locked: 52806dc76f25e958b0000004 # oo-admin-ctl-gears stopgear 52806dc76f25e958b0000004 Stopping gear 52806dc76f25e958b0000004... [ OK ] # oo-admin-ctl-gears startgear 52806dc76f25e958b0000004 Gear is locked: 52806dc76f25e958b0000004 # oo-admin-ctl-gears forcestopgear 52806dc76f25e958b0000004 Stopping gear 52806dc76f25e958b0000004... [ OK ] Move bug to verified.