Bug 1303969
Summary: | resource (un)manage: add optional switch to dis-/en-able monitor operations as well | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Jan Pokorný [poki] <jpokorny> | ||||||
Component: | pcs | Assignee: | Tomas Jelinek <tojeline> | ||||||
Status: | CLOSED ERRATA | QA Contact: | cluster-qe <cluster-qe> | ||||||
Severity: | unspecified | Docs Contact: | Steven J. Levine <slevine> | ||||||
Priority: | high | ||||||||
Version: | 7.2 | CC: | cfeist, cluster-maint, idevat, kgaillot, mkelly, omular, rsteiger, sbradley, slevine, tlavigne, tojeline | ||||||
Target Milestone: | rc | Keywords: | FutureFeature | ||||||
Target Release: | --- | ||||||||
Hardware: | Unspecified | ||||||||
OS: | Unspecified | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | pcs-0.9.158-2.el7 | Doc Type: | Release Note | ||||||
Doc Text: |
New option to the "pcs resource unmanage" command to disable monitor operations
Even when a resource is in unmanaged mode, monitor operations are still run by the cluster. That may cause the cluster to report errors the user is not interested in as those errors may be expected for a particular use case when the resource is unmanaged. The "pcs resource unmanage" command now supports the "--monitor" option, which disables monitor operations when putting a resource into unmanaged mode. Additionally, the "pcs resource manage" command also supports the "--monitor" option, which enables the monitor operations when putting a resource back into managed mode.
|
Story Points: | --- | ||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2017-08-01 18:22:57 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Description
Jan Pokorný [poki]
2016-02-02 14:50:58 UTC
Real world use case: http://oss.clusterlabs.org/pipermail/users/2016-February/002217.html Note that even more convoluted request shortcuts regarding maintenance/monitor combo may be desired (as one can decipher from that post). another real life use case: http://clusterlabs.org/pipermail/users/2016-July/003490.html I would also like to see this option. Pacemaker continues to monitor unmanaged resources so it can provide accurate status output during maintenance. The behavior is described upstream at: http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#s-monitoring-unmanaged The example in Comment 1 is not actually relevant here. That example is a problem regardless of monitors -- resources should not be moved to another mode while unmanaged. In that case, the correct resolution was to configure the resource appropriately for live migration. It is a good idea to offer users the option of disabling monitors, in case they don't want to see the failures cluttering the status output. But many users will want to know whether the service is functioning or not, regardless of maintenance mode, so I wouldn't make it the default. For example, a user might not want to leave maintenance mode until a monitor comes back successful, or they might want to know if maintenance on one resource causes problems for another (also unmanaged) resource. To disable a monitor, set enabled=FALSE in the operation definition. We do not want to disable / enable monitor operations by default. However running a resource with all monitors disabled may cause issues. Pcs should therefore display a warning when manging a resource with all monitors disabled if the user did not request enabling the monitors. Created attachment 1264062 [details]
proposed fix + tests
After Fix:
[vm-rhel72-1 ~] $ rpm -q pcs
pcs-0.9.157-1.el7.x86_64
[vm-rhel72-1 ~] $ pcs resource
R (ocf::heartbeat:Dummy): Started vm-rhel72-3
[vm-rhel72-1 ~] $ pcs resource unmanage R --monitor
[vm-rhel72-1 ~] $ pcs cluster cib|grep '<primitive.*id="R"' -A5
<primitive class="ocf" id="R" provider="heartbeat" type="Dummy">
<meta_attributes id="R-meta_attributes">
<nvpair id="R-meta_attributes-is-managed" name="is-managed" value="false"/>
</meta_attributes>
<operations>
<op id="R-monitor-interval-10" interval="10" name="monitor" timeout="20" enabled="false"/>
[vm-rhel72-1 ~] $ pcs resource manage R --monitor
[vm-rhel72-1 ~] $ pcs cluster cib|grep '<primitive.*id="R"' -A2
<primitive class="ocf" id="R" provider="heartbeat" type="Dummy">
<operations>
<op id="R-monitor-interval-10" interval="10" name="monitor" timeout="20"/>
> Code was completely overwritten so testing all combinations is required
> It must work for clone, master, group as well
Created attachment 1273637 [details]
additional fix + tests
My bad, I did not notice the monitor operation not being disabled is what was reported.
The table specifying which resources to set as unmanaged had to be updated to accommodate correct behavior with respect to disabled monitor operations:
resource hierarchy - specified resource - what to return
a primitive - the primitive - the primitive
a cloned primitive - the primitive - the primitive
a cloned primitive - the clone - the primitive
The resource will run on all nodes after unclone. However that doesn't
seem to be bad behavior. Moreover, if monitor operations were disabled,
they wouldn't enable on unclone, but the resource would become managed,
which is definitely bad.
a primitive in a group - the primitive - the primitive
Otherwise all primitives in the group would become unmanaged.
a primitive in a group - the group - all primitives in the group
If only the group was set to unmanaged, setting any primitive in the
group to managed would set all the primitives in the group to managed.
If the group as well as all its primitives were set to unmanaged, any
primitive added to the group would become unmanaged. This new primitive
would become managed if any original group primitive becomes managed.
Therefore changing one primitive influences another one, which we do
not want to happen.
a primitive in a cloned group - the primitive - the primitive
a primitive in a cloned group - the group - all primitives in the group
See group notes above
a primitive in a cloned group - the clone - all primitives in the group
See clone notes above
Test:
[root@rh73-node1:~]# pcs resource create CloneDummy ocf:heartbeat:Dummy clone
[root@rh73-node1:~]# pcs resource unmanage CloneDummy-clone --monitor
[root@rh73-node1:~]# pcs resource show CloneDummy-clone
Clone: CloneDummy-clone
Resource: CloneDummy (class=ocf provider=heartbeat type=Dummy)
Meta Attrs: is-managed=false
Operations: monitor enabled=false interval=10 timeout=20 (CloneDummy-monitor-interval-10)
start interval=0s timeout=20 (CloneDummy-start-interval-0s)
stop interval=0s timeout=20 (CloneDummy-stop-interval-0s)
After fix: [root@rh73-node1:~]# rpm -q pcs pcs-0.9.158-2.el7.x86_64 [root@rh73-node1:~]# pcs resource show dummy-clone Clone: dummy-clone Resource: dummy (class=ocf provider=pacemaker type=Dummy) Operations: monitor interval=10 timeout=20 (dummy-monitor-interval-10) start interval=0s timeout=20 (dummy-start-interval-0s) stop interval=0s timeout=20 (dummy-stop-interval-0s) [root@rh73-node1:~]# pcs resource unmanage dummy-clone --monitor [root@rh73-node1:~]# pcs resource show dummy-clone Clone: dummy-clone Resource: dummy (class=ocf provider=pacemaker type=Dummy) Meta Attrs: is-managed=false Operations: monitor enabled=false interval=10 timeout=20 (dummy-monitor-interval-10) start interval=0s timeout=20 (dummy-start-interval-0s) stop interval=0s timeout=20 (dummy-stop-interval-0s) Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:1958 |