Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1256477

Summary: ironic ipmitool intermittently timing out causing API requests to process slowly
Product: Red Hat OpenStack Reporter: Jack Waterworth <jwaterwo>
Component: instack-undercloudAssignee: John Trowbridge <jtrowbri>
Status: CLOSED ERRATA QA Contact: Toure Dunnon <tdunnon>
Severity: high Docs Contact:
Priority: high    
Version: 7.0 (Kilo)CC: calfonso, dmacpher, jtrowbri, jwaterwo, mburns, ohochman, rhel-osp-director-maint, yeylon
Target Milestone: y1Keywords: Triaged, Unconfirmed
Target Release: 7.0 (Kilo)   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: instack-undercloud-2.1.2-24.el7ost Doc Type: Bug Fix
Doc Text:
Nodes registered with an unresponsive IPMI IP address caused the sync power state periodic task to hang for a default 10 minutes. This resulted in unresponsive behavior from Ironic. This fix lowers the default IPMI retry timeout. Now unresponsive nodes report failures faster and do not hang on the sync power state periodic task.
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-10-08 12:17:21 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
ironic-conductor and ironic-disoverd logs none

Description Jack Waterworth 2015-08-24 16:51:43 UTC
Description of problem:
ironic is very slow to respond and it is deleting nodes then trying to build them again

Steps to Reproduce:
1. build 4 compute nodes
2. try to add 4 moore nodes

Actual results:
ironic is slow and is rebuilding nodes

---------------------
Aug 21 17:51:54 tpacpuidc ironic-conductor: raise exception.InstanceDeployFailure(msg)
Aug 21 17:51:54 tpacpuidc ironic-conductor: InstanceDeployFailure: Failed to notify ramdisk to reboot after bootloader installation. Error: [Errno 111] ECONNREFUSED
---------------------

Expected results:
Nodes should be added without issue

Comment 4 John Trowbridge 2015-08-24 20:07:15 UTC
Created attachment 1066619 [details]
ironic-conductor and ironic-disoverd logs

Comment 6 John Trowbridge 2015-08-25 11:01:07 UTC
I knew this seemed familiar:

https://bugs.launchpad.net/ironic/+bug/1383432

I think we should try to lower that timeout ([ipmi]  #retry_timeout=60). We should try to find the highest value that relieves the issue, as setting this too low can cause some BMCs to crash. I would try 30, 15, 10, 5.

@Jack could you have them try that and report the results?

Comment 10 Omri Hochman 2015-09-18 16:06:02 UTC
Verified : instack-undercloud-2.1.2-26.el7ost.noarch

The bug was marked SanityOnly - I checked that there were no regression found when using the proposed fix.

Comment 13 errata-xmlrpc 2015-10-08 12:17:21 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2015:1862