1577530 – stonith:fence_ipmilan

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1577530 - stonith:fence_ipmilan

Summary: stonith:fence_ipmilan

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	pacemaker
Sub Component:
Version:	7.4
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	urgent
Target Milestone:	rc
Target Release:	---
Assignee:	Ken Gaillot
QA Contact:	cluster-qe@redhat.com
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2018-05-12 19:13 UTC by Taoufik07
Modified:	2018-05-14 14:43 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2018-05-14 14:43:15 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Taoufik07 2018-05-12 19:13:31 UTC

Description of problem:

fence_cms1     (stonith:fence_ipmilan):        Started cms2
 fence_cms2     (stonith:fence_ipmilan):        Started cms1

Failed Actions:
* fence_cms2_start_0 on cms2 'unknown error' (1): call=98, status=Timed Out, exitreason='none',
    last-rc-change='Sat May 12 21:29:15 2018', queued=0ms, exec=20005ms


Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

I have this warning but my resource Started correctly

Version-Release number of selected component (if applicable):


pacemaker-1.1.16-12.el7.x86_64
pacemaker-cli-1.1.16-12.el7.x86_64
pacemaker-cluster-libs-1.1.16-12.el7.x86_64
pacemaker-libs-1.1.16-12.el7.x86_64


corosynclib-2.4.0-9.el7.x86_64
corosync-2.4.0-9.el7.x86_64
corosync-qnetd-2.4.0-9.el7.x86_64
corosynclib-devel-2.4.0-9.el7.x86_64
corosync-qdevice-2.4.0-9.el7.x86_64


Red Hat Enterprise Linux Server release 7.4 (Maipo)


Steps to Reproduce:
1.I create a resource fence_ipmlan for the first node result Succes
2.I create a resource fence_ipmlan for the second node the reresult Succes

but I have a warning

Failed Actions:
* fence_cms2_start_0 on cms2 'unknown error' (1): call=98, status=Timed Out, exitreason='none',
    last-rc-change='Sat May 12 21:29:15 2018', queued=0ms, exec=20005ms

I pcs stonith update fence_cms2 power_timeout=60

Actual results:
Failed Actions:
* fence_cms2_start_0 on cms2 'unknown error' (1): call=98, status=Timed Out, exitreason='none',
    last-rc-change='Sat May 12 21:29:15 2018', queued=0ms, exec=20005m

Expected results:


Additional info:

I have HP Gen10 with ILO5
i activate the ipmilan in bios.

Comment 2 Ken Gaillot 2018-05-14 14:28:00 UTC

The "Failed Actions" section of the status display shows all past failures. This particular message indicates that the cluster timed out trying to contact this fence device. As a high availability platform, the cluster will automatically recover from errors when possible, so it was able to successfully start the device on another try.

You can clear the message with "pcs resource cleanup fence_cms2".

The root cause of the timeout itself is unlikely to be related to the pacemaker component, so resolving it will require a support case, which can look at the wider environment and how components work together, rather than here in bugzilla, which focuses on bugs in a single software package.

You can initiate a case with Red Hat's Global Support Services group through one of the methods listed at the following link:

  https://access.redhat.com/start/how-to-engage-red-hat-support

From there, we'll collect some additional information from you and take a
closer look at the specifics of this incident to help you resolve the
underlying problem.

Comment 3 Taoufik07 2018-05-14 14:36:36 UTC

In connect to the redhat and i increase the time out 
but not working after cleanup the resource.
and I connect to my ILO and change the timeOut from 30s to 120s
after cleanUp my resource it's Working

many thansk

Comment 4 Ken Gaillot 2018-05-14 14:43:15 UTC

Great, that's good to hear :-)

Note You need to log in before you can comment on or make changes to this bug.