Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1649265

Summary:	fence-agents: validate-all output is unfriendly to both machines and users
Product:	Red Hat Enterprise Linux 7	Reporter:	Tomas Jelinek <tojeline>
Component:	fence-agents	Assignee:	Oyvind Albrigtsen <oalbrigt>
Status:	CLOSED WONTFIX	QA Contact:	cluster-qe <cluster-qe>
Severity:	unspecified	Docs Contact:
Priority:	unspecified
Version:	7.6	CC:	cluster-maint, cluster-qe, kgaillot, mgrac, oalbrigt
Target Milestone:	rc
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:	1636036	Environment:
Last Closed:	2021-03-15 07:31:28 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1644633, 1649266

Description Tomas Jelinek 2018-11-13 09:43:07 UTC

+++ This bug was initially created as a clone of Bug #1636036 +++

Fence agents provide validate-all action which is supposed to be used by pcs when creating and updating fence agents' configuration in CIB. This action returns description of validation errors in free-formated plain text not suitable for parsing by machines. That would not be an issue, pcs could just print the messages without analyzing them. However, the output is also very unfriendly to users.


Examples of output:
# stonith_admin --validate --agent fence_ipmilan --option method=cycle
Validation of fence_ipmilan failed
2018-10-04 11:34:43,746 ERROR: Failed: You have to enter fence address

2018-10-04 11:34:43,746 ERROR: validate-all failed

2018-10-04 11:34:43,746 ERROR: Please use '-h' for usage


# stonith_admin --validate --agent fence_apc --option ip=i --option login=l --option ssh=0 --option debug=d
Validation of fence_apc failed
WARNING:root:Parse error: Ignoring option 'secure' because it does not have value

ERROR:root:Failed: You have to enter password, password script or identity file

2018-10-04 11:35:04,468 ERROR: Failed: You have to enter password, password script or identity file

ERROR:root:validate-all failed

2018-10-04 11:35:04,468 ERROR: validate-all failed

ERROR:root:Please use '-h' for usage

2018-10-04 11:35:04,468 ERROR: Please use '-h' for usage


# stonith_admin --validate --agent fence_scsi
Validation of fence_scsi failed
2018-10-04 11:35:16,292 ERROR: Failed: unable to parse output of corosync-cmapctl or node does not exist

2018-10-04 11:35:16,292 ERROR: Please use '-h' for usage



What is wrong with it:
* The messages mention description of the parameters instead of their names. How are users supposed to figure out that "fence address" goes to "ip" option? Sure, they can do 'pcs stonith describe fence_ipmilan | grep address'. Not user friendly.
* Some messages are duplicated.
* Message prefixes are not consistent.
* Can we get rid of date-times and logins?
* Can we get rid of empty lines? The fact it is easy to do that in pcs should not be an excuse.
* Can we get rid of "Please use '-h' for usage" and "validate-all failed" lines? This is very confusing to users as "-h" means something completely different in pcs. Pcs cannot get rid of these lines reliably as the whole output is free-form plain text. How do we know for sure there is no important info in those lines as well?
* Validate-all should work even without a cluster (unable to parse output of corosync-cmapctl).


I can see two options to fix this:
1) Fix the output so it can be directly printed by pcs by making it friendly to users of both pcs and agents themselves.
2) Provide the output in a structured form, probably xml. This may need changes in pacemaker (an option to switch between xml and plain text). If we decide to go this way I will be happy to participate in discussing format of the output.


Then there is an issue with updating. Let's say the user wants to change an optional parameter of an agent. In such a case, pcs has to pass the new parameter as well as all parameters already defined in the CIB. Otherwise the agent would report missing required parameters. If the parameters in the CIB are not valid (which is possible to achieve with --force pcs option) then the user gets messages about parameters not mentioned in the update command at all. This does not seem right.
Again, I see two options to fix this:
1) Add support for marking some parameters as already existing and others as newly being set. Then the agent would be able to report issues for only the new parameters. Pacemaker support is again needed here.
2) Provide the output in a structured form, probably xml. This way pcs would be able to filter the messages.


Another nice to have features (may be implemented in another bz):
* Report when deprecated parameters are used and advise to use new ones and mention their names.
* Do not allow to set both deprecated and obsoleting version of one parameter. What happens when both ip and ipaddr is set at the same time with different values anyway?

--- Additional comment from Marek Grac on 2018-10-05 08:01:16 EDT ---

The original idea was to create additional output that will be machine-readable (e.g. validate-all-xml). Fence agents are quite ready for that. 

ad fence_scsi) This is a clear error, we should move validate-all check (line 494) before calling generate_key(). So validate-all check should be before this (line 488).

ad deprecated / options)
The warnings are the best option (and easy to do). I'm against strict errors as it may break existing configuration. I cannot see an easy way how to consistently work with duplicated args because it is easy on STDIN but we do not get this kind of information from getopt library. So, I will just warn on deprecated option. Question is if it should be done after every run.

--- Additional comment from Ken Gaillot on 2018-11-08 16:39:51 EST ---

I'd recommend shipping an RNG schema for the output, so we have something formal that can be included in a future OCF standard, and anything that calls the agent can validate the output if desired.

The schema should allow for errors that would have the caller and not the agent generate the XML, e.g. "The requested agent was not found" or "The agent did not return valid XML". (Those are just examples of the sort of thing I mean, they might not be relevant to the eventual implementation.)

I.e. pcs -> stonith_admin -> agent, so stonith_admin needs to be able to report such errors back to pcs in the same XML format that the agent would use.

--- Additional comment from Ken Gaillot on 2018-11-08 16:47:32 EST ---

It might be worthwhile to design this in a way that could be generalized to other actions in the future.

Maybe the desired output format could be passed to the agent (stdin or command-line option for fence agents, environment variable for resource agents), defaulting to human-readable text. That would also leave open the possibility of other formats in the future.

--- Additional comment from Ken Gaillot on 2018-11-08 16:49:53 EST ---

(In reply to Ken Gaillot from comment #3)
> It might be worthwhile to design this in a way that could be generalized to
> other actions in the future.
> 
> Maybe the desired output format could be passed to the agent (stdin or
> command-line option for fence agents, environment variable for resource
> agents), defaulting to human-readable text. That would also leave open the
> possibility of other formats in the future.

On the other hand, keeping it as a separate action allows easy discoverability of support via agent meta-data. Alternatively the action meta-data could list its supported output formats (defaulting to human-readable text only).

--- Additional comment from Ken Gaillot on 2018-11-09 14:00:10 EST ---

(In reply to Ken Gaillot from comment #2)
> The schema should allow for errors that would have the caller and not the
> agent generate the XML, e.g. "The requested agent was not found" or "The
> agent did not return valid XML". (Those are just examples of the sort of
> thing I mean, they might not be relevant to the eventual implementation.)

Thinking about this some more, perhaps it's better for the caller to implement their own super-schema wrapping the agent's. So ignore that suggestion :)

I'm now thinking that if the agent returns something like (tag names are just placeholders)

    <agent-output>
       ...
    </agent-output>

then the caller could do something like

    <caller-output>
       <agent-output>
          ...
       </agent-output>
    </caller-output>

on success, and something like 

    <caller-output>
       <caller-error .../>
    </caller-output>

for failures outside the agent.

--- Additional comment from Marek Grac on 2018-11-12 07:19:04 EST ---

@Ken:

I agree with OCF proposal and your last idea with <caller-output>. I'm just not sure if both of these parts should be standardized

Comment 2 RHEL Program Management 2021-03-15 07:31:28 UTC

After evaluating this issue, there are no plans to address it further or fix it in an upcoming release.  Therefore, it is being closed.  If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened.

Comment 3 Ken Gaillot 2021-03-15 20:30:20 UTC

FYI, the soon-to-be-adopted OCF 1.1 standard for resource agents introduces a new environment variable, OCF_OUTPUT_FORMAT, for this purpose. Pacemaker support is expected in the upstream 2.1.0 release (RHEL 8.5/9.0beta). So for resource agents, pcs will be able to call "crm_resource --output-as=xml --validate", and crm_resource will set OCF_OUTPUT_FORMAT=xml before calling the resource agent. Agents have the option of supporting or ignoring it.

As discussed above, for fence agents we could add either a new action (e.g. validate-all-xml) or a new stdin variable (e.g. output_format), and use meta-data to indicate whether it is supported.