Bug 655764

Summary: [RFE] Add "diag" option to fence_ipmilan to support ipmi chassis power diag option
Product: Red Hat Enterprise Linux 6 Reporter: Gary Smith <gasmith>
Component: fence-agentsAssignee: Marek Grac <mgrac>
Status: CLOSED ERRATA QA Contact: Cluster QE <mspqa-list>
Severity: high Docs Contact: Jana Heves <jsvarova>
Priority: high    
Version: 6.1CC: bpeck, cluster-maint, djansa, jsvarova, mgrac, snagar
Target Milestone: rcKeywords: FutureFeature, TechPreview
Target Release: 6.1   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: fence-agents-3.0.12-23.el6 Doc Type: Technology Preview
Doc Text:
Diagnostic pulse can now be issued A diagnostic pulse can now be issued on the IPMI interface using the fence_ipmilan agent. This new Technology Preview is used to force a kernel dump of a host if the host is configured to do so. Note that this feature is not a substitute for the `off` operation in a production cluster.
Story Points: ---
Clone Of:
: 678061 (view as bug list) Environment:
Last Closed: 2011-05-19 14:21:53 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 676286, 678061, 679847, 702988    
Attachments:
Description Flags
Proposed patch
none
RHEL6 merged/tested patch none

Description Gary Smith 2010-11-22 11:51:15 UTC
Description of problem:

To enhance the fence_ipmilan agent it could be useful to add a new operation to the '-o' option for diagnostic purposes.
  
Available operations on current release are:

-o <op> Operation to perform.
Valid operations: on, off, reboot, status, list or monitor

A new operation 'diag' would be very helpful to allow fence_ipmilan to forward the request "ipmitool chassis power diag" to the remote host.

This request will force the node's kernel to go into dump mode. If the node is already in the dump process the DIAG signal will be ignored.

Additional info:

This feature request will be very helpful in our large cluster environment.

Comment 2 Marek Grac 2011-01-13 13:21:19 UTC
Created attachment 473316 [details]
Proposed patch

Add option "diag" as new operation. On my machine I got:

Uhhuh. NMI received for unknown reason 31.
Do you have a strange power saving mode enabled?
Dazed and confused, but trying to continue

but machine is still up and running. I believe that signal was send correctly but my machine is not configured to support it. 

@Gary: Does this patch do what you expect?

Comment 7 Gary Smith 2011-03-07 09:21:34 UTC
(In reply to comment #2)

> Uhhuh. NMI received for unknown reason 31.
> Do you have a strange power saving mode enabled?
> Dazed and confused, but trying to continue
> 
> but machine is still up and running. I believe that signal was send correctly
> but my machine is not configured to support it. 
> 
> @Gary: Does this patch do what you expect?

I'm still waiting for an explanation from them as to how exactly they've configured their hardware and the OS to make this function as they expect. However, they have confirmed that they've tested this functionality from the command line with fence_ipmilan and it works for them as expected.

Comment 24 Lon Hohberger 2011-04-05 18:45:30 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
It is now possible to issue a diagnostic pulse using the IPMI interface using the fence_ipmilan agent.  This is not a substitute for the 'off' operation in a production cluster, but may be used to force a kernel dump of a host if that host is configured to perform dumps.  This feature is considered a Technology Preview.

Comment 26 Lon Hohberger 2011-04-06 13:48:18 UTC
Created attachment 490284 [details]
RHEL6 merged/tested patch

Comment 27 Dean Jansa 2011-04-19 15:55:44 UTC
Verified in fence-agents-3.0.12-23.el6.x86_64

Comment 30 Ryan Lerch 2011-05-10 03:43:57 UTC
    Technical note updated. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    Diffed Contents:
@@ -1 +1 @@
-It is now possible to issue a diagnostic pulse using the IPMI interface using the fence_ipmilan agent.  This is not a substitute for the 'off' operation in a production cluster, but may be used to force a kernel dump of a host if that host is configured to perform dumps.  This feature is considered a Technology Preview.+A diagnostic pulse can now be issued on the IPMI interface using the fence_ipmilan agent. This new Technology Preview is used to force a kernel dump of a host if the host is configured to do so. Note that this feature is not a substitute for the 'off' operation in a production cluster.

Comment 31 errata-xmlrpc 2011-05-19 14:21:53 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2011-0745.html