+++ This bug was initially created as a clone of Bug #1190856 +++
Description of problem:
In some cases, it is desirable to have a "stopgear" operation on nodes which creates a .stop_lock file. I would even argue that outside of watchman (and maybe upgrades) this is the more common desired behavior.
The main problem with the current behavior is this:
Suppose you have a node running at full capacity. You use "oo-admin-ctl-gears stopgear" to stop 10 of them for various reasons -- they are misbehaving due to bad code, because they hit quota, or maybe they are just secondary gears with idle frontends. After you do that, 10 more active gears get placed on the node. The next time the node is rebooted, the gears you administratively stopped start again, and aside from the fact that you have to rediscover these problem gears, you are also now 10 gears over capacity. In the worst case, you get into a state where the node is so far over capacity that the attempt to start all of the gears which are not stop-locked causes a crash, and the node reboots repeatedly until someone manually intervenes.
Version-Release number of selected component (if applicable):
--- Additional comment from Jhon Honce on 2015-02-11 12:00:04 EST ---
.stop_lock is the flag for when a gear is stopped by the user/developer vs. an administrative stop. It is currently the only way a Node knows the difference.
--- Additional comment from Andy Grimm on 2015-02-11 13:33:07 EST ---
So far the only thing I'm aware of stop_lock being used for is to tell "startall" whether or not to start the gear when the node boots, and that is precisely the thing I am trying to affect here. If stop_lock has other uses, then we need to add another flag and make startall key off the other flag (or a combination of the two flags).
--- Additional comment from Andy Grimm on 2015-02-13 10:28:57 EST ---
In reference to our IRC conversation, I'm fine with adding content to the file to differentiate between administrative stops (potentially with a reason given for the stop) and user-initiated stops.
--- Additional comment from Jhon Honce on 2015-02-23 17:24:40 EST ---
Fixed in https://github.com/openshift/origin-server/pull/6083
--- Additional comment from openshift-github-bot on 2015-02-23 18:07:45 EST ---
Commit pushed to master at https://github.com/openshift/origin-server
Bug 1190856 - Allow Operator to stop gear with .stop_lock
--- Additional comment from Meng Bo on 2015-02-25 02:23:57 EST ---
Checked on devenv_5449, the .stop_lock will be generated with given message as content.
[root@ip-10-136-82-250 runtime]# cat .stop_lock
The gear can be stopped by the option successfully. And can be started by user.
Move bug to verified.
Verified and pass on puddle-2-2-2015-03-16
1) A new option "stoplockgear" was added .stop_lock and the message was in .stop_lock
2) oo-admin-ctl-gears stoplockgear will create
oo-admin-ctl-gears stoplockgear 5507c11f4add7185960000b2 --message " Create Lock File"
[root@node2 runtime]# cat /var/lib/openshift/5507c11f4add7185960000b2/app-root/runtime/.stop_lock
Create Lock File[root@node2 runtime]#
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.