Bug 1631519
| Summary: | [RFE] The cluster should not be allowed to disable a resource if dependent resources are still online | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 8 | Reporter: | Ryan <rblough> | ||||
| Component: | pcs | Assignee: | Tomas Jelinek <tojeline> | ||||
| Status: | CLOSED ERRATA | QA Contact: | pkomarov | ||||
| Severity: | unspecified | Docs Contact: | Steven J. Levine <slevine> | ||||
| Priority: | high | ||||||
| Version: | 8.0 | CC: | cfeist, cluster-maint, idevat, kgaillot, lmanasko, mlisik, nhostako, omular, pkomarov, slevine, tojeline | ||||
| Target Milestone: | rc | Keywords: | FutureFeature | ||||
| Target Release: | 8.2 | Flags: | pm-rhel:
mirror+
|
||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | pcs-0.10.3-2.el8 | Doc Type: | Enhancement | ||||
| Doc Text: |
.New command options to disable a resource only if this would not affect other resources
It is sometimes necessary to disable resources only if this would not have an effect on other resources. Ensuring that this would be the case can be impossible to do by hand when complex resource relations are set up. To address this need, the `pcs resource disable` command now supports the following options:
* `pcs resource disable --simulate`: show effects of disabling specified resource(s) while not changing the cluster configuration
* `pcs resource disable --safe`: disable specified resource(s) only if no other resources would be affected in any way, such as being migrated from one node to another
* `pcs resource disable --safe --no-strict`: disable specified resource(s) only if no other resources would be stopped or demoted
In addition, the `pcs resource safe-disable` command has been introduced as an alias for `pcs resource disable --safe`.
|
Story Points: | --- | ||||
| Clone Of: | |||||||
| : | 1759305 1770973 (view as bug list) | Environment: | |||||
| Last Closed: | 2020-04-28 15:27:56 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Bug Depends On: | |||||||
| Bug Blocks: | 1759305, 1770973, 1787598 | ||||||
| Attachments: |
|
||||||
|
Description
Ryan
2018-09-20 19:08:24 UTC
Created attachment 1623047 [details]
proposed fix + tests
The original behavior of the "pcs resource disable" command has been preserved to maintain backward compatibility. However, new options have been added to the command providing the requested functionality:
* "pcs resource disable --simulate": show effects of disabling specified resource(s) while not changing the cluster configuration
* "pcs resource disable --safe": disable specified resource(s) only if no other resources would be stopped or demoted
* "pcs resource disable --safe --strict": disable specified resource(s) only if no other resources would be affected in any way, i.e. migrated from a node to another node
Test:
1. Set some resource dependencies, e.g.:
[root@rh80-node1:~]# pcs constraint
Location Constraints:
Ordering Constraints:
start d1 then start d2 (kind:Mandatory)
Colocation Constraints:
Ticket Constraints:
2. Check effects of disabling a resource:
[root@rh80-node1:~]# pcs resource disable --simulate d1
1 of 4 resources DISABLED and 0 BLOCKED from being started due to failures
Current cluster status:
Online: [ rh80-node1 rh80-node2 ]
xvm (stonith:fence_xvm): Started rh80-node1
d2 (ocf::pacemaker:Dummy): Started rh80-node2
d1 (ocf::pacemaker:Dummy): Started rh80-node1 (disabled)
d3 (ocf::pacemaker:Dummy): Started rh80-node2
Transition Summary:
* Stop d2 ( rh80-node2 ) due to required d1 start
* Stop d1 ( rh80-node1 ) due to node availability
Executing cluster transition:
* Resource action: d2 stop on rh80-node2
* Resource action: d1 stop on rh80-node1
Revised cluster status:
Online: [ rh80-node1 rh80-node2 ]
xvm (stonith:fence_xvm): Started rh80-node1
d2 (ocf::pacemaker:Dummy): Stopped
d1 (ocf::pacemaker:Dummy): Stopped (disabled)
d3 (ocf::pacemaker:Dummy): Started rh80-node2
3. Try disabling a resource while not stopping other resources:
[root@rh80-node1:~]# pcs resource disable --safe d1
Error: Disabling specified resources would have an effect on other resources
1 of 4 resources DISABLED and 0 BLOCKED from being started due to failures
Current cluster status:
Online: [ rh80-node1 rh80-node2 ]
xvm (stonith:fence_xvm): Started rh80-node1
d2 (ocf::pacemaker:Dummy): Started rh80-node2
d1 (ocf::pacemaker:Dummy): Started rh80-node1 (disabled)
d3 (ocf::pacemaker:Dummy): Started rh80-node2
Transition Summary:
* Stop d2 ( rh80-node2 ) due to required d1 start
* Stop d1 ( rh80-node1 ) due to node availability
Executing cluster transition:
* Resource action: d2 stop on rh80-node2
* Resource action: d1 stop on rh80-node1
Revised cluster status:
Online: [ rh80-node1 rh80-node2 ]
xvm (stonith:fence_xvm): Started rh80-node1
d2 (ocf::pacemaker:Dummy): Stopped
d1 (ocf::pacemaker:Dummy): Stopped (disabled)
d3 (ocf::pacemaker:Dummy): Started rh80-node2
4. Disable the resource anyway:
[root@rh80-node1:~]# pcs resource disable d1
[root@rh80-node1:~]# pcs status resources
d2 (ocf::pacemaker:Dummy): Stopped
d1 (ocf::pacemaker:Dummy): Stopped (disabled)
d3 (ocf::pacemaker:Dummy): Started rh80-node2
After fix:
[root@r81-node-01 ~]# rpm -q pcs
pcs-0.10.3-1.el8.x86_64
[root@r81-node-01 ~]# pcs resource
Clone Set: locking-clone [locking]
Started: [ r81-node-01 r81-node-02 ]
lvm (ocf::pacemaker:Dummy): Started r81-node-01
fs (ocf::pacemaker:Dummy): Started r81-node-01
web (ocf::pacemaker:Dummy): Started r81-node-01
[root@r81-node-01 ~]# pcs constraint
Location Constraints:
Ordering Constraints:
start locking-clone then start lvm (kind:Mandatory)
start lvm then start fs (kind:Mandatory)
start fs then start web (kind:Mandatory)
Colocation Constraints:
lvm with locking-clone (score:INFINITY)
fs with lvm (score:INFINITY)
web with fs (score:INFINITY)
Ticket Constraints:
A) --simulate
[root@r81-node-01 ~]# pcs resource disable --simulate dlm
4 of 9 resources DISABLED and 0 BLOCKED from being started due to failures
Current cluster status:
Online: [ r81-node-01 r81-node-02 ]
fence-r81-node-01 (stonith:fence_xvm): Started r81-node-01
fence-r81-node-02 (stonith:fence_xvm): Started r81-node-02
Clone Set: locking-clone [locking]
Started: [ r81-node-01 r81-node-02 ]
lvm (ocf::pacemaker:Dummy): Started r81-node-01
fs (ocf::pacemaker:Dummy): Started r81-node-01
web (ocf::pacemaker:Dummy): Started r81-node-01
Transition Summary:
* Stop dlm:0 ( r81-node-02 ) due to node availability
* Stop lvmlockd:0 ( r81-node-02 ) due to node availability
* Stop dlm:1 ( r81-node-01 ) due to node availability
* Stop lvmlockd:1 ( r81-node-01 ) due to node availability
* Stop lvm ( r81-node-01 ) due to node availability
* Stop fs ( r81-node-01 ) due to node availability
* Stop web ( r81-node-01 ) due to node availability
Executing cluster transition:
* Resource action: web stop on r81-node-01
* Resource action: fs stop on r81-node-01
* Resource action: lvm stop on r81-node-01
* Pseudo action: locking-clone_stop_0
* Pseudo action: locking:0_stop_0
* Resource action: lvmlockd stop on r81-node-02
* Pseudo action: locking:1_stop_0
* Resource action: lvmlockd stop on r81-node-01
* Resource action: dlm stop on r81-node-02
* Resource action: dlm stop on r81-node-01
* Pseudo action: locking:0_stopped_0
* Pseudo action: locking:1_stopped_0
* Pseudo action: locking-clone_stopped_0
Revised cluster status:
Online: [ r81-node-01 r81-node-02 ]
fence-r81-node-01 (stonith:fence_xvm): Started r81-node-01
fence-r81-node-02 (stonith:fence_xvm): Started r81-node-02
Clone Set: locking-clone [locking]
Stopped: [ r81-node-01 r81-node-02 ]
lvm (ocf::pacemaker:Dummy): Stopped
fs (ocf::pacemaker:Dummy): Stopped
web (ocf::pacemaker:Dummy): Stopped
[root@r81-node-01 ~]# echo $?
0
B) --safe
[root@r81-node-01 ~]# pcs resource disable lvm --safe
Error: Disabling specified resources would have an effect on other resources
1 of 9 resources DISABLED and 0 BLOCKED from being started due to failures
Current cluster status:
Online: [ r81-node-01 r81-node-02 ]
fence-r81-node-01 (stonith:fence_xvm): Started r81-node-01
fence-r81-node-02 (stonith:fence_xvm): Started r81-node-02
Clone Set: locking-clone [locking]
Started: [ r81-node-01 r81-node-02 ]
lvm (ocf::pacemaker:Dummy): Started r81-node-01 (disabled)
fs (ocf::pacemaker:Dummy): Started r81-node-01
web (ocf::pacemaker:Dummy): Started r81-node-01
Transition Summary:
* Stop lvm ( r81-node-01 ) due to node availability
* Stop fs ( r81-node-01 ) due to node availability
* Stop web ( r81-node-01 ) due to node availability
Executing cluster transition:
* Resource action: web stop on r81-node-01
* Resource action: fs stop on r81-node-01
* Resource action: lvm stop on r81-node-01
Revised cluster status:
Online: [ r81-node-01 r81-node-02 ]
fence-r81-node-01 (stonith:fence_xvm): Started r81-node-01
fence-r81-node-02 (stonith:fence_xvm): Started r81-node-02
Clone Set: locking-clone [locking]
Started: [ r81-node-01 r81-node-02 ]
lvm (ocf::pacemaker:Dummy): Stopped (disabled)
fs (ocf::pacemaker:Dummy): Stopped
web (ocf::pacemaker:Dummy): Stopped
[root@r81-node-01 ~]# echo $?
1
C) --strict
[root@r81-node-01 ~]# pcs resource disable fs --strict
Error: Disabling specified resources would have an effect on other resources
1 of 9 resources DISABLED and 0 BLOCKED from being started due to failures
Current cluster status:
Online: [ r81-node-01 r81-node-02 ]
fence-r81-node-01 (stonith:fence_xvm): Started r81-node-01
fence-r81-node-02 (stonith:fence_xvm): Started r81-node-02
Clone Set: locking-clone [locking]
Started: [ r81-node-01 r81-node-02 ]
lvm (ocf::pacemaker:Dummy): Started r81-node-01
fs (ocf::pacemaker:Dummy): Started r81-node-01 (disabled)
web (ocf::pacemaker:Dummy): Started r81-node-01
Transition Summary:
* Stop fs ( r81-node-01 ) due to node availability
* Stop web ( r81-node-01 ) due to node availability
Executing cluster transition:
* Resource action: web stop on r81-node-01
* Resource action: fs stop on r81-node-01
Revised cluster status:
Online: [ r81-node-01 r81-node-02 ]
fence-r81-node-01 (stonith:fence_xvm): Started r81-node-01
fence-r81-node-02 (stonith:fence_xvm): Started r81-node-02
Clone Set: locking-clone [locking]
Started: [ r81-node-01 r81-node-02 ]
lvm (ocf::pacemaker:Dummy): Started r81-node-01
fs (ocf::pacemaker:Dummy): Stopped (disabled)
web (ocf::pacemaker:Dummy): Stopped
[root@r81-node-01 ~]# echo $?
1
D) disable without long options
[root@r81-node-01 ~]# pcs resource disable dlm
[root@r81-node-01 ~]# pcs status resources
Clone Set: locking-clone [locking]
Stopped: [ r81-node-01 r81-node-02 ]
lvm (ocf::pacemaker:Dummy): Stopped
fs (ocf::pacemaker:Dummy): Stopped
web (ocf::pacemaker:Dummy): Stopped
Additional commit: https://github.com/ClusterLabs/pcs/commit/b2be7b5482232910a490544659b34d833783346d Changes: * added command 'pcs resource safe-disable' which is an alias of 'pcs resource disable --safe' * default behavior of 'pcs resource disable --safe' has been changed to strict mode, therefore '--strict' has been replaced by '--no-strict' option Test: [root@rhel82-devel2 pcs]# pcs resource safe-disable dummy1 --no-strict [root@rhel82-devel2 pcs]# echo $? 0 Test: [root@r81-node-01 ~]# rpm -q pcs pcs-0.10.3-2.el8.x86_64 [root@r81-node-01 ~]# pcs resource * dummy-01 (ocf::pacemaker:Dummy): Started r81-node-01 * dummy-02 (ocf::pacemaker:Dummy): Started r81-node-02 [root@r81-node-01 ~]# pcs constraint order Ordering Constraints: start dummy-01 then start dummy-02 (kind:Mandatory) [root@r81-node-01 ~]# pcs resource safe-disable dummy-01 --no-strict Error: Disabling specified resources would have an effect on other resources 1 of 4 resource instances DISABLED and 0 BLOCKED from further action due to failure Current cluster status: Online: [ r81-node-01 r81-node-02 ] fence-r81-node-01 (stonith:fence_xvm): Started r81-node-01 fence-r81-node-02 (stonith:fence_xvm): Started r81-node-02 dummy-01 (ocf::pacemaker:Dummy): Started r81-node-01 (disabled) dummy-02 (ocf::pacemaker:Dummy): Started r81-node-02 Transition Summary: * Stop dummy-01 ( r81-node-01 ) due to node availability * Stop dummy-02 ( r81-node-02 ) due to required dummy-01 start Executing cluster transition: * Resource action: dummy-02 stop on r81-node-02 * Resource action: dummy-01 stop on r81-node-01 Revised cluster status: Online: [ r81-node-01 r81-node-02 ] fence-r81-node-01 (stonith:fence_xvm): Started r81-node-01 fence-r81-node-02 (stonith:fence_xvm): Started r81-node-02 dummy-01 (ocf::pacemaker:Dummy): Stopped (disabled) dummy-02 (ocf::pacemaker:Dummy): Stopped [root@r81-node-01 ~]# echo $? 1 (In reply to Ondrej Mular from comment #11) > Additional commit: > https://github.com/ClusterLabs/pcs/commit/ > b2be7b5482232910a490544659b34d833783346d > > Changes: > * added command 'pcs resource safe-disable' which is an alias of 'pcs > resource disable --safe' > * default behavior of 'pcs resource disable --safe' has been changed to > strict mode, therefore '--strict' has been replaced by '--no-strict' option > > Test: > [root@rhel82-devel2 pcs]# pcs resource safe-disable dummy1 --no-strict > [root@rhel82-devel2 pcs]# echo $? > 0 This feedback may be late, but thinking about the general problem, maybe pcs could have a "safe mode" that would change the defaults for a wide range of commands. When someone chooses safety or speed they generally want it for everything or nothing. E.g. "pcs resource disable" would take complementary options e.g. --safe/--no-safe or --safe=true/false (and similarly for strict). "pcs mode safe" (or whatever) would set a flag in pcs_settings.conf (cluster-wide) and make "pcs resource disable" default to --safe --strict, and the user would have to specify --no-safe/--safe=false etc. to get the usual behavior. The benefit is that users don't have to remember separate commands, and sites can set general policies that are enforced with all users. Also it avoids having to add "safeX" equivalents for a bunch of other commands in the future. (And it avoids the cringy "pcs safedisable --force" == "pcs disable".) Alternatively, pcs could take a cue from rm/cp/mv, and take an -i/--interactive option. If specified, for any potentially "dangerous" command, pcs would show what would happen and ask the user for confirmation. As is commonly done for rm/cp/mv, the user could alias 'pcs' to 'pcs -i' in their bashrc. >> Perhaps pcs should ignore crm_simulate returning 1 and just go through the >> transitions? In that case, pcs would proceed and stop the resource as only >> the resource being stopped is mentioned in the transitions (meaning no other >> resources would be affected). > > I wouldn't; an invalid transition means there's something wrong with the graph. > I think it's better to require the user to force the command in that case. It > should be very rare since it indicates a bug. Also, it means that the live cluster won't be able to execute the transition if you commit the change, and will likely be blocked from all further action. So even forcing it would be a bad idea. Verified , [root@controller-0 ~]# rpm -q pcs pcs-0.10.3-2.el8.x86_64 [root@controller-0 ~]# pcs config|grep -B 1 'ip-192.168.24.45 then' Ordering Constraints: start ip-192.168.24.45 then start haproxy-bundle (kind:Optional) (id:order-ip-192.168.24.45-haproxy-bundle-Optional) [root@controller-0 ~]# pcs resource disable --safe haproxy-bundle Error: Disabling specified resources would have an effect on other resources 3 of 50 resources DISABLED and 0 BLOCKED from being started due to failures [...] Revised cluster status: Online: [ controller-0 controller-1 controller-2 ] [...] ip-192.168.24.45 (ocf::heartbeat:IPaddr2): Stopped ip-10.0.0.101 (ocf::heartbeat:IPaddr2): Stopped ip-172.17.1.70 (ocf::heartbeat:IPaddr2): Stopped ip-172.17.1.31 (ocf::heartbeat:IPaddr2): Stopped ip-172.17.3.135 (ocf::heartbeat:IPaddr2): Stopped ip-172.17.4.52 (ocf::heartbeat:IPaddr2): Stopped Container bundle set: haproxy-bundle [undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-haproxy:pcmklatest] haproxy-bundle-podman-0 (ocf::heartbeat:podman): Stopped (disabled) haproxy-bundle-podman-1 (ocf::heartbeat:podman): Stopped (disabled) haproxy-bundle-podman-2 (ocf::heartbeat:podman): Stopped (disabled) [...] Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2020:1568 |