Bug 1793574

Summary: 'pcs stonith history' returns "Fence history is not supported"
Product: Red Hat Enterprise Linux 8 Reporter: Michal Mazourek <mmazoure>
Component: pcsAssignee: Tomas Jelinek <tojeline>
Status: CLOSED ERRATA QA Contact: cluster-qe <cluster-qe>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 8.2CC: cfeist, cluster-maint, idevat, kgaillot, mlisik, mpospisi, omular, tojeline
Target Milestone: rc   
Target Release: 8.2   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: pcs-0.10.4-5.el8 Doc Type: Bug Fix
Doc Text:
Cause: User runs one of 'pcs stonith history' commands. Consequence: Pcs exits with an error saying fence history is not supported by pacemaker even if that is not the case. Fix: Mechanism detecting fence history support has been overhauled so that it works with various pacemaker versions. Result: 'pcs stonith history' commands work as expected.
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-04-28 15:27:56 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
proposed fix + tests none

Description Michal Mazourek 2020-01-21 15:30:23 UTC
Description of problem:
Command 'pcs stonith history' ends with an error message. This is a regression problem, as this command worked well in RHEL 8.1.


Version-Release number of selected component (if applicable):
pcs-0.10.4-3.el8


How reproducible:
Always


Steps to Reproduce:
1. [root@virt-036 ~]# pcs stonith history
Error: Fence history is not supported, please upgrade pacemaker


Actual results:
Command returns an error 


Expected results:
Command returns a fencing history

Comment 2 Tomas Jelinek 2020-01-21 16:48:08 UTC
Pcs checks /usr/share/pacemaker/crm_mon.rng file to see if it defines the fence_history element. This was recommended by pacemaker team as the best way to detect if fence history is supported by pacemaker. The detection is needed to prevent pcs crashes and errors in case of running a pacemaker without the fence history support. It allows pcs to print a nice error message instead, see comment 0.

In the recent pacemaker packages, the crm_mon.rng file has been changed and it no longer contains the fence_history element. So now, pcs has to check both the original file and the new file /usr/share/pacemaker/api/crm_mon-2.0.rng to detect fence history support.

Comment 3 Ken Gaillot 2020-01-21 17:10:29 UTC
(In reply to Tomas Jelinek from comment #2)
> Pcs checks /usr/share/pacemaker/crm_mon.rng file to see if it defines the
> fence_history element. This was recommended by pacemaker team as the best
> way to detect if fence history is supported by pacemaker. The detection is
> needed to prevent pcs crashes and errors in case of running a pacemaker
> without the fence history support. It allows pcs to print a nice error
> message instead, see comment 0.
> 
> In the recent pacemaker packages, the crm_mon.rng file has been changed and
> it no longer contains the fence_history element. So now, pcs has to check
> both the original file and the new file
> /usr/share/pacemaker/api/crm_mon-2.0.rng to detect fence history support.

I wouldn't hardcode crm_mon-2.0; the idea is that the API version might change over time, but crm_mon.rng will always include the latest version. So if there's some tool that will expand an rng with all its includes, that would be better. Unfortunately a brief search only turns up some experimental stuff like https://pypi.org/project/rnginline/. It might be easier to just check crm_mon.rng, and if not found, check crm_mon.rng for any externalRef tags and recurse.

Comment 5 Tomas Jelinek 2020-01-23 10:21:01 UTC
Lxml is capable of loading RelaxNG files and process externalRef, but it does not provide any means for accessing the resulting element tree.

We could write our own loader of RelaxNG files, which would process externalRef. I am not really convinced it is worth the effort here. If pacemaker provided the schema in a single file, as proposed in https://bugs.clusterlabs.org/show_bug.cgi?id=5421, pcs could benefit from that.

Another possibility is to check help of pacemaker cli tools and look for the options pcs wishes to use. This has the benefit of being confident about pacemaker capabilities. Especially in this case, as support for fence history in crm_mon was added in later pacemaker versions when stonith_admin support was already in place. The disadvantage is the overhead of running an extra process.

Another option is for pacemaker to provide a list of capabilities, similar to 'pcs --version --full' bz1230919. 'pacemakerd -F' is not really helpful in this case:
# pacemakerd -F
Pacemaker 2.0.3-3.el8 (Build: 4b1f869f0f)
 Supporting v3.2.0:  generated-manpages agent-manpages ncurses libqb-logging libqb-ipc systemd nagios  corosync-native atomic-attrd acls

Comment 6 Ken Gaillot 2020-01-24 21:23:41 UTC
(In reply to Tomas Jelinek from comment #5)
> Lxml is capable of loading RelaxNG files and process externalRef, but it
> does not provide any means for accessing the resulting element tree.
> 
> We could write our own loader of RelaxNG files, which would process
> externalRef. I am not really convinced it is worth the effort here. If
> pacemaker provided the schema in a single file, as proposed in
> https://bugs.clusterlabs.org/show_bug.cgi?id=5421, pcs could benefit from
> that.

I agree that's a good idea.

> Another possibility is to check help of pacemaker cli tools and look for the
> options pcs wishes to use. This has the benefit of being confident about
> pacemaker capabilities. Especially in this case, as support for fence
> history in crm_mon was added in later pacemaker versions when stonith_admin
> support was already in place. The disadvantage is the overhead of running an
> extra process.

Another good idea.

> Another option is for pacemaker to provide a list of capabilities, similar
> to 'pcs --version --full' bz1230919. 'pacemakerd -F' is not really helpful
> in this case:
> # pacemakerd -F
> Pacemaker 2.0.3-3.el8 (Build: 4b1f869f0f)
>  Supporting v3.2.0:  generated-manpages agent-manpages ncurses libqb-logging
> libqb-ipc systemd nagios  corosync-native atomic-attrd acls

That's three good ideas. :)

I see the feature list from "pacemakerd -F" as being suited for determining support in the current build for features that are compile-time options, rather than features that were added in particular releases.

However, the "v3.2.0" in there is the pacemaker "feature set," which would be a mostly reliable indicator of support for any given capability.
It's bumped specifically for capabilities that can only take effect once all nodes have been upgraded to a version that supports them. It goes up for most releases. There's no list of what features were added in each feature set, but we could easily figure out what feature set corresponds to any particular capability you're interested in. There will be some corner cases where support can't be mapped one-to-one to feature sets (mostly for things that don't require all nodes to be upgraded), so checking the help is probably better when you're specifically looking for support for an argument.

Comment 8 Tomas Jelinek 2020-01-27 16:21:11 UTC
Created attachment 1655710 [details]
proposed fix + tests

Comment 9 Miroslav Lisik 2020-02-17 13:57:19 UTC
Test:
[root@r8-node-01 ~]# rpm -q pcs pacemaker
pcs-0.10.4-5.el8.x86_64
pacemaker-2.0.3-4.el8.x86_64

[root@r8-node-01 ~]# pcs stonith history
0 events found

Comment 14 errata-xmlrpc 2020-04-28 15:27:56 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2020:1568