Bug 1190856 - RFE: add a "stop-and-lock" operation to oo-admin-ctl-gears
Summary: RFE: add a "stop-and-lock" operation to oo-admin-ctl-gears
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: OpenShift Online
Classification: Red Hat
Component: Containers
Version: 1.x
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 2.x
Assignee: Jhon Honce
QA Contact: libra bugs
URL:
Whiteboard:
Depends On:
Blocks: 1202512
TreeView+ depends on / blocked
 
Reported: 2015-02-09 19:30 UTC by Andy Grimm
Modified: 2016-11-08 03:48 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1202512 (view as bug list)
Environment:
Last Closed: 2015-04-21 18:01:02 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Andy Grimm 2015-02-09 19:30:48 UTC
Description of problem:

In some cases, it is desirable to have a "stopgear" operation on nodes which creates a .stop_lock file.  I would even argue that outside of watchman (and maybe upgrades) this is the more common desired behavior.

The main problem with the current behavior is this:

Suppose you have a node running at full capacity.  You use "oo-admin-ctl-gears stopgear" to stop 10 of them for various reasons -- they are misbehaving due to bad code, because they hit quota, or maybe they are just secondary gears with idle frontends.  After you do that, 10 more active gears get placed on the node.  The next time the node is rebooted, the gears you administratively stopped start again, and aside from the fact that you have to rediscover these problem gears, you are also now 10 gears over capacity.  In the worst case, you get into a state where the node is so far over capacity that the attempt to start all of the gears which are not stop-locked causes a crash, and the node reboots repeatedly until someone manually intervenes.


Version-Release number of selected component (if applicable):
openshift-origin-node-util-1.33.4-1.el6oso.noarch

Comment 1 Jhon Honce 2015-02-11 17:00:04 UTC
.stop_lock is the flag for when a gear is stopped by the user/developer vs. an administrative stop. It is currently the only way a Node knows the difference.

Comment 2 Andy Grimm 2015-02-11 18:33:07 UTC
So far the only thing I'm aware of stop_lock being used for is to tell "startall" whether or not to start the gear when the node boots, and that is precisely the thing I am trying to affect here.  If stop_lock has other uses, then we need to add another flag and make startall key off the other flag (or a combination of the two flags).

Comment 3 Andy Grimm 2015-02-13 15:28:57 UTC
In reference to our IRC conversation, I'm fine with adding content to the file to differentiate between administrative stops (potentially with a reason given for the stop) and user-initiated stops.

Comment 4 Jhon Honce 2015-02-23 22:24:40 UTC
Fixed in https://github.com/openshift/origin-server/pull/6083

Comment 5 openshift-github-bot 2015-02-23 23:07:45 UTC
Commit pushed to master at https://github.com/openshift/origin-server

https://github.com/openshift/origin-server/commit/e42f7414fa7426397683111ec5880fd520f251ab
Bug 1190856 - Allow Operator to stop gear with .stop_lock

Comment 6 Meng Bo 2015-02-25 07:23:57 UTC
Checked on devenv_5449, the .stop_lock will be generated with given message as content.

[root@ip-10-136-82-250 runtime]# cat .stop_lock 
TEST_MESSAGE

The gear can be stopped by the option successfully. And can be started by user.

Move bug to verified.


Note You need to log in before you can comment on or make changes to this bug.