Description of problem: In some cases, it is desirable to have a "stopgear" operation on nodes which creates a .stop_lock file. I would even argue that outside of watchman (and maybe upgrades) this is the more common desired behavior. The main problem with the current behavior is this: Suppose you have a node running at full capacity. You use "oo-admin-ctl-gears stopgear" to stop 10 of them for various reasons -- they are misbehaving due to bad code, because they hit quota, or maybe they are just secondary gears with idle frontends. After you do that, 10 more active gears get placed on the node. The next time the node is rebooted, the gears you administratively stopped start again, and aside from the fact that you have to rediscover these problem gears, you are also now 10 gears over capacity. In the worst case, you get into a state where the node is so far over capacity that the attempt to start all of the gears which are not stop-locked causes a crash, and the node reboots repeatedly until someone manually intervenes. Version-Release number of selected component (if applicable): openshift-origin-node-util-1.33.4-1.el6oso.noarch
.stop_lock is the flag for when a gear is stopped by the user/developer vs. an administrative stop. It is currently the only way a Node knows the difference.
So far the only thing I'm aware of stop_lock being used for is to tell "startall" whether or not to start the gear when the node boots, and that is precisely the thing I am trying to affect here. If stop_lock has other uses, then we need to add another flag and make startall key off the other flag (or a combination of the two flags).
In reference to our IRC conversation, I'm fine with adding content to the file to differentiate between administrative stops (potentially with a reason given for the stop) and user-initiated stops.
Fixed in https://github.com/openshift/origin-server/pull/6083
Commit pushed to master at https://github.com/openshift/origin-server https://github.com/openshift/origin-server/commit/e42f7414fa7426397683111ec5880fd520f251ab Bug 1190856 - Allow Operator to stop gear with .stop_lock
Checked on devenv_5449, the .stop_lock will be generated with given message as content. [root@ip-10-136-82-250 runtime]# cat .stop_lock TEST_MESSAGE The gear can be stopped by the option successfully. And can be started by user. Move bug to verified.