RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1377928 - Inclusion of "s" to denote seconds in power_timeout attribute causes fence_ipmilan STONITH devices to fail.
Summary: Inclusion of "s" to denote seconds in power_timeout attribute causes fence_ip...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: fence-agents
Version: 7.2
Hardware: x86_64
OS: Linux
unspecified
urgent
Target Milestone: rc
: ---
Assignee: Marek Grac
QA Contact: cluster-qe@redhat.com
URL:
Whiteboard:
Depends On:
Blocks: 1377970
TreeView+ depends on / blocked
 
Reported: 2016-09-21 04:02 UTC by Simon Thomson
Modified: 2017-08-01 16:10 UTC (History)
4 users (show)

Fixed In Version: fence-agents-4.0.11-52.el7
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1377970 (view as bug list)
Environment:
Last Closed: 2017-08-01 16:10:32 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1384484 0 low CLOSED Some resource agents' metadata do not conform to the xml schema 2021-10-07 14:31:45 UTC
Red Hat Product Errata RHBA-2017:1874 0 normal SHIPPED_LIVE fence-agents bug fix and enhancement update 2017-08-01 17:53:05 UTC

Internal Links: 1384484

Description Simon Thomson 2016-09-21 04:02:45 UTC
Description of problem:

Configuring a fence_ipmilan based STONITH device with a power_timeout value that includes an "s": 

power_timeout=60s

Will cause STONITH to fail:

pcs stonith fence node1
Error: unable to fence 'node1'
Command failed: No route to host

The following is present in the cluster DC logs:

Sep 20 18:04:58 node2 user.notice python:detected unhandled Python exception in '/usr/sbin/fence_ipmilan'
Sep 20 18:05:17 node2 user.notice python:detected unhandled Python exception in '/usr/sbin/fence_ipmilan'
Sep 20 18:05:35 node2 daemon.err stonith-ng[4525]:   error: Operation 'reboot' [32131] (call 2 from stonith_admin.31703) for host 'node2' with device 'fence_node2_ipmi' returned: -201
 (Generic Pacemaker error)
Sep 20 18:05:35 node2 daemon.warning stonith-ng[4525]: warning: fence_node1_ipmi:32131 [ error: db5 error(11) from dbenv->open: Resource temporarily unavailable ]
Sep 20 18:05:35 node2 daemon.warning stonith-ng[4525]: warning: fence_node1_ipmi:32131 [ error: cannot open Packages index using db5 - Resource temporarily unavailable (11) ]
Sep 20 18:05:35 node2 daemon.warning stonith-ng[4525]: warning: fence_node1_ipmi:32131 [ error: cannot open Packages database in /var/lib/rpm ]
Sep 20 18:05:35 node2 daemon.warning stonith-ng[4525]: warning: fence_node1_ipmi:32131 [ Traceback (most recent call last): ]
Sep 20 18:05:35 node2 daemon.warning stonith-ng[4525]: warning: fence_node1_ipmi:32131 [   File "/usr/sbin/fence_ipmilan", line 186, in <module> ]
Sep 20 18:05:35 node2 daemon.warning stonith-ng[4525]: warning: fence_node1_ipmi:32131 [     main() ]
Sep 20 18:05:35 node2 daemon.warning stonith-ng[4525]: warning: fence_node1_ipmi:32131 [   File "/usr/sbin/fence_ipmilan", line 182, in main ]
Sep 20 18:05:35 node2 daemon.warning stonith-ng[4525]: warning: fence_node1_ipmi:32131 [     result = fence_action(None, options, set_power_status, get_power_status, None, reboot_cycle) ]
Sep 20 18:05:35 node2 daemon.warning stonith-ng[4525]: warning: fence_node1_ipmi:32131 [   File "/usr/share/fence/fencing.py", line 964, in fence_action ]
Sep 20 18:05:35 node2 daemon.warning stonith-ng[4525]: warning: fence_node1_ipmi:32131 [     status = get_multi_power_fn(tn, options, get_power_fn) ]
Sep 20 18:05:35 node2 daemon.warning stonith-ng[4525]: warning: fence_node1_ipmi:32131 [   File "/usr/share/fence/fencing.py", line 871, in get_multi_power_fn ]
Sep 20 18:05:35 node2 daemon.warning stonith-ng[4525]: warning: fence_node1_ipmi:32131 [     plug_status = get_power_fn(tn, options) ]
Sep 20 18:05:35 node2 daemon.warning stonith-ng[4525]: warning: fence_node1_ipmi:32131 [   File "/usr/sbin/fence_ipmilan", line 17, in get_power_status ]
Sep 20 18:05:35 node2 daemon.warning stonith-ng[4525]: warning: fence_node1_ipmi:32131 [     output = run_command(options, create_command(options, "status")) ]
Sep 20 18:05:35 node2 daemon.warning stonith-ng[4525]: warning: fence_node1_ipmi:32131 [   File "/usr/share/fence/fencing.py", line 1183, in run_command ]
Sep 20 18:05:35 node2 daemon.warning stonith-ng[4525]: warning: fence_node1_ipmi:32131 [     timeout = float(timeout) ]
Sep 20 18:05:35 node2 daemon.warning stonith-ng[4525]: warning: fence_node1_ipmi:32131 [ ValueError: invalid literal for float(): 60s ]

Changing the STONITH configuration to remove the "s":

[root@node2~]# pcs stonith update fence_node1_ipmi power_timeout=60

Will allow STONITH operations to complete:

[root@node2 ~]# pcs stonith fence node1
Node: node1 fenced

I would think that the inclusion of "s" to denote seconds should not cause STONITH to fail.

Version-Release number of selected component (if applicable):

[root@e7359svin1637 ~]# rpm -qa | grep fence-agents-ipmilan
fence-agents-ipmilan-4.0.11-27.el7.x86_64

How reproducible:

100%

A fence_ipmilan based STONITH device configured with a power_timeout attribute that includes an "s" to denote seconds will fail 100% of the time.

Steps to Reproduce:
1.Configure a fence_ipmilan based STONITH device including a power_timeout attribute like "power_timeout=60s"

2.Attempt to fence a cluster node using the STONITH device configured above:
# pcs stonith fence node1


Actual results:

Fencing fails with a "No route to host error".
# pcs stonith fence node1
Error: unable to fence 'node1'
Command failed: No route to host

Expected results:

The node should be fenced:
# pcs stonith fence node1
Node: node1 fenced

Additional info:

None

Comment 2 Marek Grac 2016-09-21 07:09:00 UTC
Hi,

I agree that there should be no python exception visible for users and we will fix that. 

But suffixes like '[smh]' are not used in cluster suite, so we will not support this. Appropriate error message should be displayed.

Comment 3 Simon Thomson 2016-09-21 07:40:34 UTC
Hi Marek,

An informative error message or just accepting and discarding suffixes like '[smh]' if they are configured would be useful. Thanks.

There seems to be some inconsistency around the requirement/acceptance of these suffixes for time based cluster suite parameters. In this case using a suffix causes a failure. In others we are directed to use a suffix in Red Hat High Availability Documentation.

e.g. From https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/High_Availability_Add-On_Reference/s1-fencedevicecreate-HAAR.html

"The following command creates a stonith device.

# pcs stonith create MyStonith fence_virt pcmk_host_list=f1 op monitor interval=30s"

e.g. From https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/High_Availability_Add-On_Reference/s1-resourceopts-HAAR.html

"In the following example, there is an existing resource named dummy_resource. This command sets the failure-timeout meta option to 20 seconds, so that the resource can attempt to restart on the same node in 20 seconds.
# pcs resource meta dummy_resource failure-timeout=20s"

If all time based cluster suite parameters can only be configured in seconds then a universal approach to accepting/rejecting/discarding suffixes like '[smh]' would make sense.

Comment 4 Marek Grac 2016-09-21 07:50:18 UTC
I understand your concerns and they make sense, even more with the quotation from the documentation. Those 'interval=30s' are pacemaker/pcs that are not used in fence agent at all. So this transformation from [smh] to seconds should be done in pcs. I will add type 'seconds' so they know which options should be translated. Afterwards, [smh] should work as expected.

Comment 5 Marek Grac 2016-09-21 08:07:52 UTC
Types (second/integer) were added to upstream.

https://github.com/ClusterLabs/fence-agents/commit/e0fa4827b2ec931a182a3781cc2223c79cba2563

Comment 6 Simon Thomson 2016-09-22 01:05:25 UTC
Looks good, thanks Marek.

Comment 10 Jan Pokorný [poki] 2017-07-14 15:03:47 UTC
See also http://oss.clusterlabs.org/pipermail/users/2017-July/006055.html

Comment 11 errata-xmlrpc 2017-08-01 16:10:32 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:1874


Note You need to log in before you can comment on or make changes to this bug.