Bug 1781357

Summary: [RFE] fence_aws: Logging enhancements
Product: Red Hat Enterprise Linux 8 Reporter: Reid Wahl <nwahl>
Component: fence-agentsAssignee: Oyvind Albrigtsen <oalbrigt>
Status: CLOSED ERRATA QA Contact: Brandon Perkins <bperkins>
Severity: medium Docs Contact:
Priority: medium    
Version: 8.1CC: bperkins, cluster-maint, fguilher, oalbrigt
Target Milestone: rcKeywords: FutureFeature
Target Release: 8.2Flags: pm-rhel: mirror+
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: fence-agents-4.2.1-40.el8 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1832289 (view as bug list) Environment:
Last Closed: 2020-04-28 15:30:27 UTC Type: Feature Request
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1832289    

Description Reid Wahl 2019-12-09 20:37:59 UTC
Description of problem:

From customer:
~~~
I would like to be able to track down on which phase/stage of a fencing request it failed. For example, if it was during set_power_status(), get_power_status() or any other place in the code.

For example, SUSE's agent lets you easily figure out when the the StopInstance was issued, and what was the status and time taken for the instance to move from "Stopping" to "Stopped", or from "Pending" to "Running" by printing the status and progress of the checks.

Honestly, anything we can think of that would help our customers and us (as support engineers) to troubleshoot and have more clarify on what the fence agent is doing is of great help, so I'm open to suggestions.
~~~

I think this is doable, and perhaps it's worth doing for all the fence agents rather than only having a special case for fence_aws. There may be some agent-specific data that can be logged, but the basic suggestion to print the the output of get_power_status and possibly when each call to [gs]et_power_status begins and/or ends could apply to many agents.

The existing verbose option prints a huge amount of output that's not feasible to direct to syslog for any extended period. A "middle ground" logging option would be nice. Not sure if we would want to make that the default (for easier failure troubleshooting) or keep the current "quiet" logging pattern as the default.

This is the SUSE agent that the customer referred to:
  - https://github.com/ClusterLabs/cluster-glue/blob/master/lib/plugins/stonith/external/ec2

-----

Version-Release number of selected component (if applicable):

fence-agents-aws-4.2.1-30.el8_1.1.noarch

-----

Steps to Test:

1. Set newly added debug option and observe logs during stonith actions.

-----

Additional info:

Customer is willing to assist in testing and provide any relevant info/suggestions from the AWS side.

Comment 1 Oyvind Albrigtsen 2020-01-31 15:12:10 UTC
https://github.com/ClusterLabs/fence-agents/pull/318

Comment 6 errata-xmlrpc 2020-04-28 15:30:27 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:1580

Comment 7 Reid Wahl 2020-05-05 19:04:55 UTC
Customer wants to know if it's possible to backport this to RHEL 7. I set the expectation that it's unlikely, since RHEL 7 is in maintenance phase and no longer receiving enhancements. Relaying the question here.

Comment 8 Oyvind Albrigtsen 2020-05-06 13:36:55 UTC
I've cloned this bz for RHEL7.