Bug 1759269

Summary: 'pcs resource description' could lead users to misunderstand 'cleanup' and 'refresh' [RHEL 7]
Product: Red Hat Enterprise Linux 7 Reporter: Ken Gaillot <kgaillot>
Component: pcsAssignee: Tomas Jelinek <tojeline>
Status: CLOSED ERRATA QA Contact: cluster-qe <cluster-qe>
Severity: medium Docs Contact:
Priority: medium    
Version: 7.6CC: cfeist, cluster-maint, cluster-qe, idevat, jseunghw, mlisik, mpospisi, nhostako, nwahl, omular, ondrej-redhat-developer, pkomarov, sbradley, tojeline
Target Milestone: rc   
Target Release: 7.9   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: pcs-0.9.169-1.el7 Doc Type: Bug Fix
Doc Text:
Cause: User runs 'pcs resource cleanup | refresh' to clean history / failures of a resource. Consequence: If the resource has a parent resource such as a bundle, clone or group, the parent resource's history is cleaned as well. Fix: Clarify the functionality of the commands in documentation. Provide an option to limit the operation to the specified resource only. Result: User is able to clean history / failures of a specified resource without affecting its parent resources.
Story Points: ---
Clone Of: 1758969
: 1805082 (view as bug list) Environment:
Last Closed: 2020-09-29 20:10:26 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1758969    
Bug Blocks: 1805082, 1846412    
Attachments:
Description Flags
proposed fix + tests
none
proposed fix 2 none

Comment 3 Ondrej Faměra 2019-10-23 06:11:17 UTC
Hi,

Is there possibility to acknowledge this bugzilla from from `pcs` perspective 
in similar way how the `pacemaker` (Ken in particular) responded in BZ 1758969?

In case of this BZ we are mostly interested if in `pcs` this will be just a documentation 
fix or if there will be rather change in functionality to match current documentation text in `pcs`.
Thank you for feedback.

--
Ondrej

Comment 4 Tomas Jelinek 2019-10-29 16:21:20 UTC
Current situation in pcs:

The 'pcs resource cleanup' command does not allow --force or any other flag which would be propagated to crm_resource --cleanup as --force. So we can add a flag to the pcs command for this purpose. Going with --force is not the best option I think, since the flag is used for different purpose throughout pcs.

The 'pcs resource refresh' uses --force flag for running the command even in cases a significant load on a cluster would be caused - if the cluster has many nodes an many resources, --force is required to run the command. So we definitely cannot use pcs's --force for crm_resource's --force.

However, 'pcs resource refresh' also has --full flag which is propagated to crm_resource as --force. And it is documented in pcs:
Use --full to refresh a resource on all nodes, otherwise only nodes where the resource's state is known will be considered.

Ken: Has the meaning of --force in crm_resource --refresh changed since it was implemented? See bz1508350 for details.

Comment 5 Ken Gaillot 2019-10-29 17:40:47 UTC
(In reply to Tomas Jelinek from comment #4)
> Current situation in pcs:
> 
> The 'pcs resource cleanup' command does not allow --force or any other flag
> which would be propagated to crm_resource --cleanup as --force. So we can
> add a flag to the pcs command for this purpose. Going with --force is not
> the best option I think, since the flag is used for different purpose
> throughout pcs.
> 
> The 'pcs resource refresh' uses --force flag for running the command even in
> cases a significant load on a cluster would be caused - if the cluster has
> many nodes an many resources, --force is required to run the command. So we
> definitely cannot use pcs's --force for crm_resource's --force.

That's reasonable

> However, 'pcs resource refresh' also has --full flag which is propagated to
> crm_resource as --force. And it is documented in pcs:
> Use --full to refresh a resource on all nodes, otherwise only nodes where
> the resource's state is known will be considered.

I was surprised to hear that. Looking into it, that idea apparently was abandoned and never implemented. I don't see any value in probing on nodes with resource-discovery=never, so I'm not sure why Beekhof originally proposed it.

> Ken: Has the meaning of --force in crm_resource --refresh changed since it
> was implemented? See bz1508350 for details.

--force for --cleanup/--refresh has always meant what it does in the new help text, both before and after the behavior of cleanup changed. This is the final help text we went with: "If the named resource is part of a group, or one numbered instance of a clone or bundled resource, the clean-up applies to the whole collective resource unless --force is given."

Comment 6 Tomas Jelinek 2020-03-03 15:16:01 UTC
Confirming --force has no effect in crm_resource --refresh in pacemaker-1.1.21-3.el7 and pacemaker-2.0.3-5.el8

Comment 8 Tomas Jelinek 2020-03-04 15:31:15 UTC
Created attachment 1667511 [details]
proposed fix + tests

* updated pcs help and manpage to match pacemaker
* 'pcs (resource | stonith) (cleanup | refresh)' now has the --strict flag which translates to pacemaker's --force flag

Comment 9 Tomas Jelinek 2020-03-05 13:53:08 UTC
Created attachment 1667766 [details]
proposed fix 2

Make sure 'pcs resource | stonith refresh --full' works the same way as before to keep backwards compatibility for whoever was using it. The --full flag is not supposed to be documented.

Comment 10 Ivan Devat 2020-04-09 14:34:53 UTC
After Fix

> --strict is supported

[kid76 ~] $ pcs resource cleanup -h|head -n3|tail -n1
    cleanup [<resource id>] [--node <node>] [--strict]

[kid76 ~] $ pcs resource refresh -h|head -n3|tail -n1
    refresh [<resource id>] [--node <node>] [--strict]

[kid76 ~] $ pcs stonith cleanup -h|head -n3|tail -n1
    cleanup [<stonith id>] [--node <node>] [--strict]

[kid76 ~] $ pcs stonith refresh -h|head -n3|tail -n1
    refresh [<stonith id>] [--node <node>] [--strict]

[kid76 ~] $ pcs resource cleanup --strict
Cleaned up all resources on all nodes

[kid76 ~] $ pcs resource refresh --strict
Waiting for 1 reply from the CRMd. OK

[kid76 ~] $ pcs stonith cleanup --strict
Cleaned up all resources on all nodes

[kid76 ~] $ pcs stonith refresh --strict
Waiting for 1 reply from the CRMd. OK


> --full' works

[kid76 ~] $ pcs resource refresh --full
Waiting for 1 reply from the CRMd. OK

[kid76 ~] $ pcs stonith refresh --full
Waiting for 1 reply from the CRMd. OK

Comment 17 Seunghwan Jung 2020-05-13 08:40:20 UTC
Hi,

This is a RHEL7 bug. but, just to check on RHEL8, I found:


 *not* reflected with 'pcs resource description' on RHEL 8.2.

    cleanup [<resource id>] [node=<node>] [operation=<operation>
            [interval=<interval>]]
        Make the cluster forget failed operations from history of the resource
        and re-detect its current state. This can be useful to purge knowledge
        of past failures that have since been resolved. If a resource id is not
        specified then all resources / stonith devices will be cleaned up. If a
        node is not specified then resources / stonith devices on all nodes will
        be cleaned up.

    refresh [<resource id>] [node=<node>] [--full]
        Make the cluster forget the complete operation history (including
        failures) of the resource and re-detect its current state. If you are
        interested in forgetting failed operations only, use the 'pcs resource
        cleanup' command. If a resource id is not specified then all resources
        / stonith devices will be refreshed. If a node is not specified then
        resources / stonith devices on all nodes will be refreshed. Use --full
        to refresh a resource on all nodes, otherwise only nodes where the
        resource's state is known will be considered.



 reflected with 'crm_resource --help' on RHEL8.2 

..

 -C, --cleanup			If resource has any past failures, clear its history and fail count.
				Optionally filtered by --resource, --node, --operation, and --interval (otherwise all).
				--operation and --interval apply to fail counts, but entire history is always cleared,
				to allow current state to be rechecked. If the named resource is part of a group, or
				one numbered instance of a clone or bundled resource, the clean-up applies to the
				whole collective resource unless --force is given.                                    <================
 -R, --refresh			Delete resource's history (including failures) so its current state is rechecked.
				Optionally filtered by --resource and --node (otherwise all). If the named resource is
				part of a group, or one numbered instance of a clone or bundled resource, the clean-up
applies to the whole collective resource unless --force is given.


 - pcs-0.10.4-6.el8.x86_64
 - pacemaker-2.0.3-5.el8.x86_64


Do we have to clone a bug for RHEL8 for pcs ?

Comment 18 Tomas Jelinek 2020-05-13 09:20:45 UTC
This bz has been cloned for RHEL8 already: bz1805082

Comment 21 errata-xmlrpc 2020-09-29 20:10:26 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (pcs bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:3964