Hide Forgot
1. Proposed title of this feature request Do not disable a resource if dependent resources are still online 3. What is the nature and description of the request? When disabling a resource, we want to know if resources that rely on it are still operating. For example, we don't want to pull an lvm resource out from under a filesystem resource, or a filesystem resource out from under a db resource. 4. Why does the customer need this? (List the business requirements here) Prevent data loss/corruption by stopping resources inappropriately. 5. How would the customer like to achieve this? (List the functional requirements here) An error or at least a warning should be thrown when a pcs resource disable <resource id> command targets a resource which is a dependency of other running resources. 6. For each functional requirement listed, specify how Red Hat and the customer can test to confirm the requirement is successfully implemented. Before: pcs resource disable <resource id> succeeds. After: pcs resource disable <resource id> throws error or warning. 7. Is there already an existing RFE upstream or in Red Hat Bugzilla? No. 8. Does the customer have any specific timeline dependencies and which release would they like to target (i.e. RHEL5, RHEL6)? 7.6 if possible. 9. Is the sales team involved in this request and do they have any additional input? No. 10. List any affected packages or components. pcs 11. Would the customer be able to assist in testing this functionality if implemented? Yes.
Created attachment 1623047 [details] proposed fix + tests The original behavior of the "pcs resource disable" command has been preserved to maintain backward compatibility. However, new options have been added to the command providing the requested functionality: * "pcs resource disable --simulate": show effects of disabling specified resource(s) while not changing the cluster configuration * "pcs resource disable --safe": disable specified resource(s) only if no other resources would be stopped or demoted * "pcs resource disable --safe --strict": disable specified resource(s) only if no other resources would be affected in any way, i.e. migrated from a node to another node Test: 1. Set some resource dependencies, e.g.: [root@rh80-node1:~]# pcs constraint Location Constraints: Ordering Constraints: start d1 then start d2 (kind:Mandatory) Colocation Constraints: Ticket Constraints: 2. Check effects of disabling a resource: [root@rh80-node1:~]# pcs resource disable --simulate d1 1 of 4 resources DISABLED and 0 BLOCKED from being started due to failures Current cluster status: Online: [ rh80-node1 rh80-node2 ] xvm (stonith:fence_xvm): Started rh80-node1 d2 (ocf::pacemaker:Dummy): Started rh80-node2 d1 (ocf::pacemaker:Dummy): Started rh80-node1 (disabled) d3 (ocf::pacemaker:Dummy): Started rh80-node2 Transition Summary: * Stop d2 ( rh80-node2 ) due to required d1 start * Stop d1 ( rh80-node1 ) due to node availability Executing cluster transition: * Resource action: d2 stop on rh80-node2 * Resource action: d1 stop on rh80-node1 Revised cluster status: Online: [ rh80-node1 rh80-node2 ] xvm (stonith:fence_xvm): Started rh80-node1 d2 (ocf::pacemaker:Dummy): Stopped d1 (ocf::pacemaker:Dummy): Stopped (disabled) d3 (ocf::pacemaker:Dummy): Started rh80-node2 3. Try disabling a resource while not stopping other resources: [root@rh80-node1:~]# pcs resource disable --safe d1 Error: Disabling specified resources would have an effect on other resources 1 of 4 resources DISABLED and 0 BLOCKED from being started due to failures Current cluster status: Online: [ rh80-node1 rh80-node2 ] xvm (stonith:fence_xvm): Started rh80-node1 d2 (ocf::pacemaker:Dummy): Started rh80-node2 d1 (ocf::pacemaker:Dummy): Started rh80-node1 (disabled) d3 (ocf::pacemaker:Dummy): Started rh80-node2 Transition Summary: * Stop d2 ( rh80-node2 ) due to required d1 start * Stop d1 ( rh80-node1 ) due to node availability Executing cluster transition: * Resource action: d2 stop on rh80-node2 * Resource action: d1 stop on rh80-node1 Revised cluster status: Online: [ rh80-node1 rh80-node2 ] xvm (stonith:fence_xvm): Started rh80-node1 d2 (ocf::pacemaker:Dummy): Stopped d1 (ocf::pacemaker:Dummy): Stopped (disabled) d3 (ocf::pacemaker:Dummy): Started rh80-node2 4. Disable the resource anyway: [root@rh80-node1:~]# pcs resource disable d1 [root@rh80-node1:~]# pcs status resources d2 (ocf::pacemaker:Dummy): Stopped d1 (ocf::pacemaker:Dummy): Stopped (disabled) d3 (ocf::pacemaker:Dummy): Started rh80-node2
After fix: [root@r81-node-01 ~]# rpm -q pcs pcs-0.10.3-1.el8.x86_64 [root@r81-node-01 ~]# pcs resource Clone Set: locking-clone [locking] Started: [ r81-node-01 r81-node-02 ] lvm (ocf::pacemaker:Dummy): Started r81-node-01 fs (ocf::pacemaker:Dummy): Started r81-node-01 web (ocf::pacemaker:Dummy): Started r81-node-01 [root@r81-node-01 ~]# pcs constraint Location Constraints: Ordering Constraints: start locking-clone then start lvm (kind:Mandatory) start lvm then start fs (kind:Mandatory) start fs then start web (kind:Mandatory) Colocation Constraints: lvm with locking-clone (score:INFINITY) fs with lvm (score:INFINITY) web with fs (score:INFINITY) Ticket Constraints: A) --simulate [root@r81-node-01 ~]# pcs resource disable --simulate dlm 4 of 9 resources DISABLED and 0 BLOCKED from being started due to failures Current cluster status: Online: [ r81-node-01 r81-node-02 ] fence-r81-node-01 (stonith:fence_xvm): Started r81-node-01 fence-r81-node-02 (stonith:fence_xvm): Started r81-node-02 Clone Set: locking-clone [locking] Started: [ r81-node-01 r81-node-02 ] lvm (ocf::pacemaker:Dummy): Started r81-node-01 fs (ocf::pacemaker:Dummy): Started r81-node-01 web (ocf::pacemaker:Dummy): Started r81-node-01 Transition Summary: * Stop dlm:0 ( r81-node-02 ) due to node availability * Stop lvmlockd:0 ( r81-node-02 ) due to node availability * Stop dlm:1 ( r81-node-01 ) due to node availability * Stop lvmlockd:1 ( r81-node-01 ) due to node availability * Stop lvm ( r81-node-01 ) due to node availability * Stop fs ( r81-node-01 ) due to node availability * Stop web ( r81-node-01 ) due to node availability Executing cluster transition: * Resource action: web stop on r81-node-01 * Resource action: fs stop on r81-node-01 * Resource action: lvm stop on r81-node-01 * Pseudo action: locking-clone_stop_0 * Pseudo action: locking:0_stop_0 * Resource action: lvmlockd stop on r81-node-02 * Pseudo action: locking:1_stop_0 * Resource action: lvmlockd stop on r81-node-01 * Resource action: dlm stop on r81-node-02 * Resource action: dlm stop on r81-node-01 * Pseudo action: locking:0_stopped_0 * Pseudo action: locking:1_stopped_0 * Pseudo action: locking-clone_stopped_0 Revised cluster status: Online: [ r81-node-01 r81-node-02 ] fence-r81-node-01 (stonith:fence_xvm): Started r81-node-01 fence-r81-node-02 (stonith:fence_xvm): Started r81-node-02 Clone Set: locking-clone [locking] Stopped: [ r81-node-01 r81-node-02 ] lvm (ocf::pacemaker:Dummy): Stopped fs (ocf::pacemaker:Dummy): Stopped web (ocf::pacemaker:Dummy): Stopped [root@r81-node-01 ~]# echo $? 0 B) --safe [root@r81-node-01 ~]# pcs resource disable lvm --safe Error: Disabling specified resources would have an effect on other resources 1 of 9 resources DISABLED and 0 BLOCKED from being started due to failures Current cluster status: Online: [ r81-node-01 r81-node-02 ] fence-r81-node-01 (stonith:fence_xvm): Started r81-node-01 fence-r81-node-02 (stonith:fence_xvm): Started r81-node-02 Clone Set: locking-clone [locking] Started: [ r81-node-01 r81-node-02 ] lvm (ocf::pacemaker:Dummy): Started r81-node-01 (disabled) fs (ocf::pacemaker:Dummy): Started r81-node-01 web (ocf::pacemaker:Dummy): Started r81-node-01 Transition Summary: * Stop lvm ( r81-node-01 ) due to node availability * Stop fs ( r81-node-01 ) due to node availability * Stop web ( r81-node-01 ) due to node availability Executing cluster transition: * Resource action: web stop on r81-node-01 * Resource action: fs stop on r81-node-01 * Resource action: lvm stop on r81-node-01 Revised cluster status: Online: [ r81-node-01 r81-node-02 ] fence-r81-node-01 (stonith:fence_xvm): Started r81-node-01 fence-r81-node-02 (stonith:fence_xvm): Started r81-node-02 Clone Set: locking-clone [locking] Started: [ r81-node-01 r81-node-02 ] lvm (ocf::pacemaker:Dummy): Stopped (disabled) fs (ocf::pacemaker:Dummy): Stopped web (ocf::pacemaker:Dummy): Stopped [root@r81-node-01 ~]# echo $? 1 C) --strict [root@r81-node-01 ~]# pcs resource disable fs --strict Error: Disabling specified resources would have an effect on other resources 1 of 9 resources DISABLED and 0 BLOCKED from being started due to failures Current cluster status: Online: [ r81-node-01 r81-node-02 ] fence-r81-node-01 (stonith:fence_xvm): Started r81-node-01 fence-r81-node-02 (stonith:fence_xvm): Started r81-node-02 Clone Set: locking-clone [locking] Started: [ r81-node-01 r81-node-02 ] lvm (ocf::pacemaker:Dummy): Started r81-node-01 fs (ocf::pacemaker:Dummy): Started r81-node-01 (disabled) web (ocf::pacemaker:Dummy): Started r81-node-01 Transition Summary: * Stop fs ( r81-node-01 ) due to node availability * Stop web ( r81-node-01 ) due to node availability Executing cluster transition: * Resource action: web stop on r81-node-01 * Resource action: fs stop on r81-node-01 Revised cluster status: Online: [ r81-node-01 r81-node-02 ] fence-r81-node-01 (stonith:fence_xvm): Started r81-node-01 fence-r81-node-02 (stonith:fence_xvm): Started r81-node-02 Clone Set: locking-clone [locking] Started: [ r81-node-01 r81-node-02 ] lvm (ocf::pacemaker:Dummy): Started r81-node-01 fs (ocf::pacemaker:Dummy): Stopped (disabled) web (ocf::pacemaker:Dummy): Stopped [root@r81-node-01 ~]# echo $? 1 D) disable without long options [root@r81-node-01 ~]# pcs resource disable dlm [root@r81-node-01 ~]# pcs status resources Clone Set: locking-clone [locking] Stopped: [ r81-node-01 r81-node-02 ] lvm (ocf::pacemaker:Dummy): Stopped fs (ocf::pacemaker:Dummy): Stopped web (ocf::pacemaker:Dummy): Stopped
Additional commit: https://github.com/ClusterLabs/pcs/commit/b2be7b5482232910a490544659b34d833783346d Changes: * added command 'pcs resource safe-disable' which is an alias of 'pcs resource disable --safe' * default behavior of 'pcs resource disable --safe' has been changed to strict mode, therefore '--strict' has been replaced by '--no-strict' option Test: [root@rhel82-devel2 pcs]# pcs resource safe-disable dummy1 --no-strict [root@rhel82-devel2 pcs]# echo $? 0
Test: [root@r81-node-01 ~]# rpm -q pcs pcs-0.10.3-2.el8.x86_64 [root@r81-node-01 ~]# pcs resource * dummy-01 (ocf::pacemaker:Dummy): Started r81-node-01 * dummy-02 (ocf::pacemaker:Dummy): Started r81-node-02 [root@r81-node-01 ~]# pcs constraint order Ordering Constraints: start dummy-01 then start dummy-02 (kind:Mandatory) [root@r81-node-01 ~]# pcs resource safe-disable dummy-01 --no-strict Error: Disabling specified resources would have an effect on other resources 1 of 4 resource instances DISABLED and 0 BLOCKED from further action due to failure Current cluster status: Online: [ r81-node-01 r81-node-02 ] fence-r81-node-01 (stonith:fence_xvm): Started r81-node-01 fence-r81-node-02 (stonith:fence_xvm): Started r81-node-02 dummy-01 (ocf::pacemaker:Dummy): Started r81-node-01 (disabled) dummy-02 (ocf::pacemaker:Dummy): Started r81-node-02 Transition Summary: * Stop dummy-01 ( r81-node-01 ) due to node availability * Stop dummy-02 ( r81-node-02 ) due to required dummy-01 start Executing cluster transition: * Resource action: dummy-02 stop on r81-node-02 * Resource action: dummy-01 stop on r81-node-01 Revised cluster status: Online: [ r81-node-01 r81-node-02 ] fence-r81-node-01 (stonith:fence_xvm): Started r81-node-01 fence-r81-node-02 (stonith:fence_xvm): Started r81-node-02 dummy-01 (ocf::pacemaker:Dummy): Stopped (disabled) dummy-02 (ocf::pacemaker:Dummy): Stopped [root@r81-node-01 ~]# echo $? 1
(In reply to Ondrej Mular from comment #11) > Additional commit: > https://github.com/ClusterLabs/pcs/commit/ > b2be7b5482232910a490544659b34d833783346d > > Changes: > * added command 'pcs resource safe-disable' which is an alias of 'pcs > resource disable --safe' > * default behavior of 'pcs resource disable --safe' has been changed to > strict mode, therefore '--strict' has been replaced by '--no-strict' option > > Test: > [root@rhel82-devel2 pcs]# pcs resource safe-disable dummy1 --no-strict > [root@rhel82-devel2 pcs]# echo $? > 0 This feedback may be late, but thinking about the general problem, maybe pcs could have a "safe mode" that would change the defaults for a wide range of commands. When someone chooses safety or speed they generally want it for everything or nothing. E.g. "pcs resource disable" would take complementary options e.g. --safe/--no-safe or --safe=true/false (and similarly for strict). "pcs mode safe" (or whatever) would set a flag in pcs_settings.conf (cluster-wide) and make "pcs resource disable" default to --safe --strict, and the user would have to specify --no-safe/--safe=false etc. to get the usual behavior. The benefit is that users don't have to remember separate commands, and sites can set general policies that are enforced with all users. Also it avoids having to add "safeX" equivalents for a bunch of other commands in the future. (And it avoids the cringy "pcs safedisable --force" == "pcs disable".) Alternatively, pcs could take a cue from rm/cp/mv, and take an -i/--interactive option. If specified, for any potentially "dangerous" command, pcs would show what would happen and ask the user for confirmation. As is commonly done for rm/cp/mv, the user could alias 'pcs' to 'pcs -i' in their bashrc.
>> Perhaps pcs should ignore crm_simulate returning 1 and just go through the >> transitions? In that case, pcs would proceed and stop the resource as only >> the resource being stopped is mentioned in the transitions (meaning no other >> resources would be affected). > > I wouldn't; an invalid transition means there's something wrong with the graph. > I think it's better to require the user to force the command in that case. It > should be very rare since it indicates a bug. Also, it means that the live cluster won't be able to execute the transition if you commit the change, and will likely be blocked from all further action. So even forcing it would be a bad idea.
Verified , [root@controller-0 ~]# rpm -q pcs pcs-0.10.3-2.el8.x86_64 [root@controller-0 ~]# pcs config|grep -B 1 'ip-192.168.24.45 then' Ordering Constraints: start ip-192.168.24.45 then start haproxy-bundle (kind:Optional) (id:order-ip-192.168.24.45-haproxy-bundle-Optional) [root@controller-0 ~]# pcs resource disable --safe haproxy-bundle Error: Disabling specified resources would have an effect on other resources 3 of 50 resources DISABLED and 0 BLOCKED from being started due to failures [...] Revised cluster status: Online: [ controller-0 controller-1 controller-2 ] [...] ip-192.168.24.45 (ocf::heartbeat:IPaddr2): Stopped ip-10.0.0.101 (ocf::heartbeat:IPaddr2): Stopped ip-172.17.1.70 (ocf::heartbeat:IPaddr2): Stopped ip-172.17.1.31 (ocf::heartbeat:IPaddr2): Stopped ip-172.17.3.135 (ocf::heartbeat:IPaddr2): Stopped ip-172.17.4.52 (ocf::heartbeat:IPaddr2): Stopped Container bundle set: haproxy-bundle [undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-haproxy:pcmklatest] haproxy-bundle-podman-0 (ocf::heartbeat:podman): Stopped (disabled) haproxy-bundle-podman-1 (ocf::heartbeat:podman): Stopped (disabled) haproxy-bundle-podman-2 (ocf::heartbeat:podman): Stopped (disabled) [...]
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2020:1568