Hide Forgot
There are cases when setting resource as unmanaged is not enough as the underlying operation of recurring monitor actions are not paused [1] and this can cause issues when values other than those indicating "started" or "stopped" are signalled (just because the environment is in some kind of reconstruction). Proposed solution: pcs resource unmanage RESOURCE --with-monitor pcs resource manage RESOURCE --with-monitor --with-monitor for unmanage: for all respective recurring monitors: enabled=false --with-monitor for manage: for all respective recurring monitors: enabled=true (or rather, drop the property and use the identical default) Alternatively, monitor=1 instead of --with-monitor. [1] http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/#_monitoring_resources_when_administration_is_disabled
Real world use case: http://oss.clusterlabs.org/pipermail/users/2016-February/002217.html Note that even more convoluted request shortcuts regarding maintenance/monitor combo may be desired (as one can decipher from that post).
another real life use case: http://clusterlabs.org/pipermail/users/2016-July/003490.html
I would also like to see this option.
Pacemaker continues to monitor unmanaged resources so it can provide accurate status output during maintenance. The behavior is described upstream at: http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#s-monitoring-unmanaged The example in Comment 1 is not actually relevant here. That example is a problem regardless of monitors -- resources should not be moved to another mode while unmanaged. In that case, the correct resolution was to configure the resource appropriately for live migration. It is a good idea to offer users the option of disabling monitors, in case they don't want to see the failures cluttering the status output. But many users will want to know whether the service is functioning or not, regardless of maintenance mode, so I wouldn't make it the default. For example, a user might not want to leave maintenance mode until a monitor comes back successful, or they might want to know if maintenance on one resource causes problems for another (also unmanaged) resource. To disable a monitor, set enabled=FALSE in the operation definition.
We do not want to disable / enable monitor operations by default. However running a resource with all monitors disabled may cause issues. Pcs should therefore display a warning when manging a resource with all monitors disabled if the user did not request enabling the monitors.
Created attachment 1264062 [details] proposed fix + tests
After Fix: [vm-rhel72-1 ~] $ rpm -q pcs pcs-0.9.157-1.el7.x86_64 [vm-rhel72-1 ~] $ pcs resource R (ocf::heartbeat:Dummy): Started vm-rhel72-3 [vm-rhel72-1 ~] $ pcs resource unmanage R --monitor [vm-rhel72-1 ~] $ pcs cluster cib|grep '<primitive.*id="R"' -A5 <primitive class="ocf" id="R" provider="heartbeat" type="Dummy"> <meta_attributes id="R-meta_attributes"> <nvpair id="R-meta_attributes-is-managed" name="is-managed" value="false"/> </meta_attributes> <operations> <op id="R-monitor-interval-10" interval="10" name="monitor" timeout="20" enabled="false"/> [vm-rhel72-1 ~] $ pcs resource manage R --monitor [vm-rhel72-1 ~] $ pcs cluster cib|grep '<primitive.*id="R"' -A2 <primitive class="ocf" id="R" provider="heartbeat" type="Dummy"> <operations> <op id="R-monitor-interval-10" interval="10" name="monitor" timeout="20"/> > Code was completely overwritten so testing all combinations is required > It must work for clone, master, group as well
Created attachment 1273637 [details] additional fix + tests My bad, I did not notice the monitor operation not being disabled is what was reported. The table specifying which resources to set as unmanaged had to be updated to accommodate correct behavior with respect to disabled monitor operations: resource hierarchy - specified resource - what to return a primitive - the primitive - the primitive a cloned primitive - the primitive - the primitive a cloned primitive - the clone - the primitive The resource will run on all nodes after unclone. However that doesn't seem to be bad behavior. Moreover, if monitor operations were disabled, they wouldn't enable on unclone, but the resource would become managed, which is definitely bad. a primitive in a group - the primitive - the primitive Otherwise all primitives in the group would become unmanaged. a primitive in a group - the group - all primitives in the group If only the group was set to unmanaged, setting any primitive in the group to managed would set all the primitives in the group to managed. If the group as well as all its primitives were set to unmanaged, any primitive added to the group would become unmanaged. This new primitive would become managed if any original group primitive becomes managed. Therefore changing one primitive influences another one, which we do not want to happen. a primitive in a cloned group - the primitive - the primitive a primitive in a cloned group - the group - all primitives in the group See group notes above a primitive in a cloned group - the clone - all primitives in the group See clone notes above Test: [root@rh73-node1:~]# pcs resource create CloneDummy ocf:heartbeat:Dummy clone [root@rh73-node1:~]# pcs resource unmanage CloneDummy-clone --monitor [root@rh73-node1:~]# pcs resource show CloneDummy-clone Clone: CloneDummy-clone Resource: CloneDummy (class=ocf provider=heartbeat type=Dummy) Meta Attrs: is-managed=false Operations: monitor enabled=false interval=10 timeout=20 (CloneDummy-monitor-interval-10) start interval=0s timeout=20 (CloneDummy-start-interval-0s) stop interval=0s timeout=20 (CloneDummy-stop-interval-0s)
After fix: [root@rh73-node1:~]# rpm -q pcs pcs-0.9.158-2.el7.x86_64 [root@rh73-node1:~]# pcs resource show dummy-clone Clone: dummy-clone Resource: dummy (class=ocf provider=pacemaker type=Dummy) Operations: monitor interval=10 timeout=20 (dummy-monitor-interval-10) start interval=0s timeout=20 (dummy-start-interval-0s) stop interval=0s timeout=20 (dummy-stop-interval-0s) [root@rh73-node1:~]# pcs resource unmanage dummy-clone --monitor [root@rh73-node1:~]# pcs resource show dummy-clone Clone: dummy-clone Resource: dummy (class=ocf provider=pacemaker type=Dummy) Meta Attrs: is-managed=false Operations: monitor enabled=false interval=10 timeout=20 (dummy-monitor-interval-10) start interval=0s timeout=20 (dummy-start-interval-0s) stop interval=0s timeout=20 (dummy-stop-interval-0s)
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:1958