Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1851113

Summary: OpenShift 4.4.5 bare-metal IPI times out
Product: OpenShift Container Platform Reporter: jteagno <jteagno+bugzilla>
Component: Bare Metal Hardware ProvisioningAssignee: Julia Kreger <jkreger>
Bare Metal Hardware Provisioning sub component: ironic QA Contact: Raviv Bar-Tal <rbartal>
Status: CLOSED DEFERRED Docs Contact:
Severity: medium    
Priority: medium CC: bfournie, bhershbe, emahoney, jkreger, rbartal, stbenjam
Version: 4.4Keywords: Triaged
Target Milestone: ---   
Target Release: 4.6.0   
Hardware: x86_64   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-07-29 21:36:54 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description jteagno 2020-06-25 15:55:16 UTC
Description of problem:

A baremetal IPI installation of OpenShift 4.4.5 cannot be completed.

openshift-baremetal-install times out after 40 minutes, and no progress appears to have been made.

The worker machines do not get powered on via IPMI during these 40 minutes.

Notably, we *are* able to manually power-on machines by running `ipmitool` from within the `ironic-conductor` container running on the baremetal bootstrap VM inside the bastion node.

The worker machines are HP ProLiant BL460c Gen9, with iLO 4 .


Version-Release number of selected component (if applicable):

OpenShift 4.4.5


How reproducible:

Appears every time an openshift-baremetal-install is attempted for this cluster.


Steps to Reproduce:
1. Run `openshift-baremetal-install create manifests`
2. Run `openshift-baremetal-install create cluster`
3. Wait 40 minutes for the installer to time out

Actual results:

The cluster is not installed, with the worker nodes never having been powered on via IPMI.


Expected results:

Worker nodes are powered on; cluster installation commences.


Additional info:

Comment 1 Julia Kreger 2020-06-25 17:23:59 UTC
A couple questions:

Is the ipmitool command from with-in the ironic-conductor container taking a long time to execute when the commands are manually run? The time command may be useful for measuring this.

Is there any logging output available from the ironic-conductor container? Without any sort of additional logging detail, it will be difficult to identify a root cause.

Your indicating workers and cluster installation. Is this workers and masters or is it just works after the initial masters are deployed? Or do none of the machines power on?

What are the firmware versions for the iLO controllers of these machines?

Comment 9 Stephen Benjamin 2020-06-25 19:59:00 UTC
40 minutes is the bootstrap complete timeout - that is long before workers would start coming up. This is the part where the control plane comes up. Not seeing worker nodes powered on would be expected. What were the status of your control plane hosts?

Can you supply the output from the installer?

Comment 23 Julia Kreger 2020-07-29 21:36:54 UTC
Greetings Joe,

That sounds reasonable. I'll go ahead and close this item out. Thanks!

-Julia