Bug 1220512

Summary: pcs resource cleanup improvements
Product: Red Hat Enterprise Linux 7 Reporter: David Vossel <dvossel>
Component: pcsAssignee: Tomas Jelinek <tojeline>
Status: CLOSED ERRATA QA Contact: cluster-qe <cluster-qe>
Severity: unspecified Docs Contact:
Priority: medium    
Version: 7.2CC: abeekhof, cchen, cluster-maint, fdinitto, idevat, mlisik, rsteiger, tojeline
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: pcs-0.9.151-1.el7 Doc Type: Bug Fix
Doc Text:
Cause: User runs 'pcs resource cleanup' command in a cluster with high number of resources and/or nodes. Consequence: Cluster may get less responsive for a while. Fix: Display a warning describing the negative impact of the command if appropriate. Add options to the command to specify resource and/or node to run on. Result: User is informed about negative impacts and has options to reduce it while being able to perform desired operation.
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-11-03 20:54:04 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
proposed fix none

Description David Vossel 2015-05-11 17:47:17 UTC
Description of problem:

'pcs resource cleanup' translates directly to crm_resource -C. Behind the scenes this command results in pacemaker completely wiping all of its resource state history. To rebuild resource state, pacemaker must execute a monitor operation for every resource within the cluster on every node within the cluster. This means crm_resource -C will always result in (resources * nodes)  operations being executed on the cluster.

For small clusters, this isn't a big deal. 3 nodes with 10 resources is equal to 30 monitor operations in order for pacemaker to rebuild state. However, large clusters, 16 nodes with 100 resources would result in 1600 monitor operations before pacemaker can rebuild state.

Looking through customer's logs I'm seeing a trend emerge. If there's ever a failure, users just run 'pcs resource cleanup' to make it go away. This is starting to cause problems though because on large clusters 'pcs resource cleanup' will result in the pacemaker cluster appearing to be unresponsive for several minutes while all the monitor operations are in flight.

To keep people from unintentionally hosing up their clusters I think pcs resource cleanup needs some more options and safe guards in place.

1. we need a way to specify the node the cleanup should occur on. 'crm_resource -C -N <node>' allows us to only re-detect resource status on a single node. Node can be combined with resource id as well. 'crm_resource -C -N <node> -r <resource id>'. This allows us to only re-detect a single resource on a single node rather than every resource on a node.

2. We should consider a requiring a --force option for 'pcs resource cleanup' when we detect the command will generate enough monitor operations to negatively impact the responsiveness of the cluster.

For example, if someone issues 'pcs resource cleanup' on the cluster with 16 nodes and 100 resources, we should be able to detect that's going to result in 1600 operations and warn the user requiring them to use --force to proceed with the command.

detection of 100 or more resulting operations seems like a decent threshold for requiring --force.

Comment 4 Tomas Jelinek 2016-02-26 16:06:06 UTC
Created attachment 1130861 [details]
proposed fix

Test:
Add nodes and/or resources to cluster so that number of resources times number of nodes exceeds 100.
[root@rh72-node1:~]# pcs status | grep configured
2 nodes and 53 resources configured
[root@rh72-node1:~]# pcs resource cleanup
Error: Cleaning up all resources on all nodes will execute more than 100 operations in the cluster, which may negatively impact the responsiveness of the cluster. Consider specifying resource and/or node, use --force to override
[root@rh72-node1:~]# echo $?
1
[root@rh72-node1:~]# pcs resource cleanup dummy
Waiting for 2 replies from the CRMd.. OK
Cleaning up dummy on rh72-node1, removing fail-count-dummy
Cleaning up dummy on rh72-node2, removing fail-count-dummy
[root@rh72-node1:~]# echo $?
0
[root@rh72-node1:~]# pcs resource cleanup --node rh72-node1
Waiting for 1 replies from the CRMd. OK
[root@rh72-node1:~]# echo $?
0
[root@rh72-node1:~]# pcs resource cleanup --node rh72-node1 dummy
Waiting for 1 replies from the CRMd. OK
Cleaning up dummy on rh72-node1, removing fail-count-dummy
[root@rh72-node1:~]# echo $?
0
[root@rh72-node1:~]# pcs resource cleanup --force
Waiting for 1 replies from the CRMd. OK
[root@rh72-node1:~]# echo $?
0

[root@rh72-node1:~]# pcs status | grep configured
2 nodes and 3 resources configured
[root@rh72-node1:~]# pcs resource cleanup
Waiting for 1 replies from the CRMd. OK

Comment 5 Mike McCune 2016-03-28 23:40:50 UTC
This bug was accidentally moved from POST to MODIFIED via an error in automation, please see mmccune with any questions

Comment 6 Tomas Jelinek 2016-04-05 07:26:13 UTC
*** Bug 1323901 has been marked as a duplicate of this bug. ***

Comment 7 Ivan Devat 2016-05-31 12:25:10 UTC
Setup:
[vm-rhel72-1 ~] $ for i in {a..b}; do for j in {a..z}; do pcs resource create ${i}${j} Dummy; done ;done
[vm-rhel72-1 ~] $ pcs status | grep configured
2 nodes and 52 resources configured

Before fix:
[vm-rhel72-1 ~] $ rpm -q pcs
pcs-0.9.143-15.el7.x86_64

[vm-rhel72-1 ~] $ pcs resource cleanup
Waiting for 1 replies from the CRMd. OK

[vm-rhel72-1 ~] $ pcs resource cleanup --node vm-rhel72-1 aa
Waiting for 2 replies from the CRMd.. OK
Cleaning up aa on vm-rhel72-1, removing fail-count-aa
Cleaning up aa on vm-rhel72-3, removing fail-count-aa


After Fix:
[vm-rhel72-1 ~] $ rpm -q pcs
pcs-0.9.151-1.el7.x86_64

[vm-rhel72-1 ~] $ pcs resource cleanup
Error: Cleaning up all resources on all nodes will execute more than 100 operations in the cluster, which may negatively impact the responsiveness of the cluster. Consider specifying resource and/or node, use --force to override
[vm-rhel72-1 ~] $ echo $?
1

[vm-rhel72-1 ~] $ pcs resource cleanup --node vm-rhel72-1 aa
Waiting for 1 replies from the CRMd. OK
Cleaning up aa on vm-rhel72-1, removing fail-count-aa

[vm-rhel72-1 ~] $ for i in {a..b}; do for j in {a..z}; do pcs resource delete ${i}${j} Dummy; done ;done
[vm-rhel72-1 ~] $ for i in {a..z}; do pcs resource create ${i} Dummy; done
[vm-rhel72-1 ~] $ pcs status | grep configured
2 nodes and 52 resources configured

[vm-rhel72-1 ~] $ pcs resource cleanup
Waiting for 1 replies from the CRMd. OK

Comment 10 Tomas Jelinek 2016-09-23 07:25:21 UTC
*** Bug 1366514 has been marked as a duplicate of this bug. ***

Comment 12 errata-xmlrpc 2016-11-03 20:54:04 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2016-2596.html