Bug 1831893 - Baremetal nodes with HP BMCs fail introspection due to ipmitool timeout
Summary: Baremetal nodes with HP BMCs fail introspection due to ipmitool timeout
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-ironic
Version: 16.1 (Train)
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: beta
: 16.1 (Train on RHEL 8.2)
Assignee: RHOS Maint
QA Contact: Alistair Tonner
URL:
Whiteboard:
Depends On: 1831158
Blocks: 1849038
TreeView+ depends on / blocked
 
Reported: 2020-05-05 20:29 UTC by Bob Fournier
Modified: 2022-08-08 12:27 UTC (History)
14 users (show)

Fixed In Version: openstack-ironic-13.0.4-0.20200529150915.911bc51.el8ost
Doc Type: Bug Fix
Doc Text:
A regression was introduced in ipmitool-1.8.18-11 that caused IPMI access to take over two minutes for certain BMCs that did not support the "Get Cipher Suites". As a result, introspection could fail and deployments could take much longer than previously. + With this update, ipmitool retries are handled differently, introspection passes, and deployments succeed. + [NOTE] This issue with ipmitool is resolved in ipmitool-1.8.18-17.
Clone Of:
: 1849038 (view as bug list)
Environment:
Last Closed: 2020-07-29 07:52:21 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
OpenStack gerrit 728261 0 None MERGED New configuration parameter to use ipmitool retries 2020-10-15 07:52:02 UTC
Red Hat Issue Tracker OSP-10399 0 None None None 2022-08-08 12:27:24 UTC
Red Hat Product Errata RHBA-2020:3148 0 None None None 2020-07-29 07:52:42 UTC

Description Bob Fournier 2020-05-05 20:29:05 UTC
Description of problem:

This is an OSP bug to track the ipmitool bug - https://bugzilla.redhat.com/show_bug.cgi?id=1831158

With the version of ipmitool that is used in RHEL 8.2 we are getting introspection failures when using HP iLo BMCs.

This was seen on an HP ProLiant DL360 Gen9.  Introspection fails with:

openstack overcloud node introspect hp-dl360-g9-02 --provide
Waiting for introspection to finish...
Waiting for messages on queue 'tripleo' with no timeout.
Introspection of node completed:ced799b5-6619-44db-90cd-71c3955e3043. Status:FAILED. Errors:Failed to set boot device to PXE: Timed out waiting for a reply to message ID a3c7ab7325004808b4ae6411dce0f2db (HTTP 500)
Retrying 1 nodes that failed introspection. Attempt 1 of 3 
Introspection of node completed:ced799b5-6619-44db-90cd-71c3955e3043. Status:FAILED. Errors:Failed to set boot device to PXE: Timed out waiting for a reply to message ID ff20e76a05d444eabecf80031e9a518d (HTTP 500)
Retrying 1 nodes that failed introspection. Attempt 2 of 3 
Introspection of node completed:ced799b5-6619-44db-90cd-71c3955e3043. Status:FAILED. Errors:Failed to set boot device to PXE: Timed out waiting for a reply to message ID adcc70c4f43d4d06818e31718f1882e2 (HTTP 500)
Retrying 1 nodes that failed introspection. Attempt 3 of 3 
Introspection of node completed:ced799b5-6619-44db-90cd-71c3955e3043. Status:FAILED. Errors:Failed to set boot device to PXE: Timed out waiting for a reply to message ID 5c641cd7bf5847fb8d643ef4ad120243 (HTTP 500)
Retry limit reached with 1 nodes still failing introspection


In the logs we see:
containers/ironic/ironic-conductor.log.1:2020-05-04 23:55:41.385 7 DEBUG ironic.common.utils [req-eb49faaa-94bd-4f0e-badd-064272ba1ebc - - - - -] Command stderr is: "Unable to Get Channel Cipher Suites
containers/ironic/ironic-conductor.log.1:2020-05-04 23:57:52.657 7 DEBUG ironic.common.utils [req-eb49faaa-94bd-4f0e-badd-064272ba1ebc - - - - -] Command stderr is: "Unable to Get Channel Cipher Suites
containers/ironic/ironic-conductor.log.1:2020-05-05 00:00:03.935 7 DEBUG ironic.common.utils [req-eb49faaa-94bd-4f0e-badd-064272ba1ebc - - - - -] Command stderr is: "Unable to Get Channel Cipher Suites

Running the ipmitool command manually takes 2 minutes to complete:

()[ironic@hardprov-dl360-g9-01 /]$ time ipmitool -I lanplus -H 10.9.103.29 -U DMINISTRATOR -P XXX -v -R 12 -N 5 chassis status
...
real	2m6.271s
user	0m0.002s
sys	0m0.004s


This issue was also seen with vbmc but it was resolved with a new version of pyghmi in https://bugzilla.redhat.com/show_bug.cgi?id=1813889, pyghmi is not used with baremetal BMC access.

Version-Release number of selected component (if applicable):

HP ProLiant DL360 Gen9 - iLO versions 2.54 (Jun 15 2017) and 2.60 (latest available, May 23 2018)

ipmitool-1.8.18-14.el8.x86_64


How reproducible:

Happens every time with this BMC.  It works fine with Dell systems that have been tested.

Comment 4 Bob Fournier 2020-06-08 14:37:58 UTC
Package is in compose RHOS-16.1-RHEL-8-20200604.n.1.

Comment 5 Bob Fournier 2020-06-09 17:41:01 UTC
Verified that we no longer get a 2 minute response from ipmitool due to the Cipher Suites issue.  Ipmitool commands are now being issued with "-R 1 -N 1" and retries are done by ironic.
Running cmd (subprocess): ipmitool -I lanplus -H 172.16.0.28 -L ADMINISTRATOR -p 6230 -U admin -R 1 -N 1 -f /tmp/tmpebyzf379

ipmi.use_ipmitool_retries      = False

Comment 7 Alex McLeod 2020-06-16 12:30:57 UTC
If this bug requires doc text for errata release, please set the 'Doc Type' and provide draft text according to the template in the 'Doc Text' field. The documentation team will review, edit, and approve the text.

If this bug does not require doc text, please set the 'requires_doc_text' flag to '-'.

Comment 10 errata-xmlrpc 2020-07-29 07:52:21 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:3148


Note You need to log in before you can comment on or make changes to this bug.