Bug 1851113 - OpenShift 4.4.5 bare-metal IPI times out
Summary: OpenShift 4.4.5 bare-metal IPI times out
Keywords:
Status: CLOSED DEFERRED
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Bare Metal Hardware Provisioning
Version: 4.4
Hardware: x86_64
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.6.0
Assignee: Julia Kreger
QA Contact: Raviv Bar-Tal
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-06-25 15:55 UTC by jteagno
Modified: 2020-07-29 21:36 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-07-29 21:36:54 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description jteagno 2020-06-25 15:55:16 UTC
Description of problem:

A baremetal IPI installation of OpenShift 4.4.5 cannot be completed.

openshift-baremetal-install times out after 40 minutes, and no progress appears to have been made.

The worker machines do not get powered on via IPMI during these 40 minutes.

Notably, we *are* able to manually power-on machines by running `ipmitool` from within the `ironic-conductor` container running on the baremetal bootstrap VM inside the bastion node.

The worker machines are HP ProLiant BL460c Gen9, with iLO 4 .


Version-Release number of selected component (if applicable):

OpenShift 4.4.5


How reproducible:

Appears every time an openshift-baremetal-install is attempted for this cluster.


Steps to Reproduce:
1. Run `openshift-baremetal-install create manifests`
2. Run `openshift-baremetal-install create cluster`
3. Wait 40 minutes for the installer to time out

Actual results:

The cluster is not installed, with the worker nodes never having been powered on via IPMI.


Expected results:

Worker nodes are powered on; cluster installation commences.


Additional info:

Comment 1 Julia Kreger 2020-06-25 17:23:59 UTC
A couple questions:

Is the ipmitool command from with-in the ironic-conductor container taking a long time to execute when the commands are manually run? The time command may be useful for measuring this.

Is there any logging output available from the ironic-conductor container? Without any sort of additional logging detail, it will be difficult to identify a root cause.

Your indicating workers and cluster installation. Is this workers and masters or is it just works after the initial masters are deployed? Or do none of the machines power on?

What are the firmware versions for the iLO controllers of these machines?

Comment 9 Stephen Benjamin 2020-06-25 19:59:00 UTC
40 minutes is the bootstrap complete timeout - that is long before workers would start coming up. This is the part where the control plane comes up. Not seeing worker nodes powered on would be expected. What were the status of your control plane hosts?

Can you supply the output from the installer?

Comment 23 Julia Kreger 2020-07-29 21:36:54 UTC
Greetings Joe,

That sounds reasonable. I'll go ahead and close this item out. Thanks!

-Julia


Note You need to log in before you can comment on or make changes to this bug.