Bug 1932799 - During a hive driven baremetal installation the process does not go beyond 80% in the bootstrap VM
Summary: During a hive driven baremetal installation the process does not go beyond 80...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 4.6.z
Hardware: x86_64
OS: Linux
high
urgent
Target Milestone: ---
: 4.8.0
Assignee: Beth White
QA Contact: Raviv Bar-Tal
URL:
Whiteboard:
Depends On:
Blocks: dit 1935163
TreeView+ depends on / blocked
 
Reported: 2021-02-25 09:53 UTC by Ulrich Schlueter
Modified: 2021-07-27 22:48 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Baremetal IPI previously required that the installer was able to communicate with the provisioning network. This changes that communication to happen over the API VIP, enabling cases where the provisioning network is not routable and the installer is being run from a remote location, such as from Hive or ACM. Users may need to adjust their firewall rules to allow communication with TCP ports 6385 and 5050 on the API VIP.
Clone Of:
Environment:
Last Closed: 2021-07-27 22:48:26 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift installer pull 4692 0 None open Bug 1932799: baremetal: move ironic API's to use the API VIP 2021-02-25 14:52:03 UTC
Red Hat Product Errata RHSA-2021:2438 0 None None None 2021-07-27 22:48:51 UTC

Description Ulrich Schlueter 2021-02-25 09:53:53 UTC
Version:

$ openshift-install version
4.6.17

Platform:

baremetal

Please specify:
IPI

What happened?

In a lab setup we using hive to trigger a baremetal installation. The installation process creates the bootstrap VM but then does not go further than 80% and times out after 60 mins and the baremetal servers remain powered off. Using the same install-config with the openshift-installer directly works and produces a working cluster. 

#See the troubleshooting documentation (https://github.com/openshift/installer/blob/master/docs/user/troubleshooting.md) for ideas about what information to collect.

#For example, 

# If the installer fails to create resources (https://github.com/openshift/installer/blob/master/docs/user/troubleshooting.md#installer-fails-to-create-resources), attach the relevant portions of your `.openshift_install.log.`
# If the installer fails to bootstrap the cluster (https://github.com/openshift/installer/blob/master/docs/user/troubleshootingbootstrap.md), attach the bootstrap log bundle.
# If the installer fails to complete installation after bootstrapping completes (https://github.com/openshift/installer/blob/master/docs/user/troubleshooting.md#installer-fails-to-initialize-the-cluster), attach the must-gather log bundle using `oc adm must-gather`

# Always at least include the `.openshift_install.log`

What did you expect to happen?

Working cluster, servers powered on by the installation process


Anything else we need to know?

Comment 4 Stephen Benjamin 2021-02-25 12:10:26 UTC
Reading through the logs, the ironic API failed to come up. That's usually indicative OS images failed to download.  If you ssh to the bootstrap, can you run /usr/local/bin/installer-gather.sh on the bootstrap and attach the tarball to the BZ? That'll include all the logs I should need from the bootstrap.

Comment 6 Stephen Benjamin 2021-02-25 14:51:18 UTC
Back when we tested Hive and baremetal IPI, we were using a setup that included routable provisioning networks and that was a constraint at the time.  The installer needed to communicate with the provisioning (ironic) API's hosted on the bootstrap host using the bootstrap provisioning IP.

However, since 4.7 it should be possible to use the API VIP's instead. In fact, when using virtual media with the provisioning network disabled, we already use the API VIP, so it probably makes sense to make the installer always do this regardless of whether there's a network or not.

I have to have some other people look at my proposal for this, I hope I'm not overlooking something. 

For earlier than 4.7, I don't have any immediate ideas how to workaround this, without making the provisioning network routable.

Comment 8 Raviv Bar-Tal 2021-03-04 06:34:19 UTC
Hey Ulrich,
Can you please re-test in our lab environment, and update this BZ?

Thanks 
Raviv

Comment 9 Stephen Benjamin 2021-03-04 13:07:10 UTC
For now you'll need to use a recent 4.8 nightly. I am not entirely sure yet we can backport this to 4.7.z, it might be surprising to customers if we change which IP the installer is talking to in a z-stream for provisioning

Comment 11 Amit Ugol 2021-03-22 11:00:49 UTC

*** This bug has been marked as a duplicate of bug 1936443 ***

Comment 12 Stephen Benjamin 2021-05-10 20:58:37 UTC
Amit why did you mark this as a duplicate of 1936443? They look like entirely different issues.

Comment 13 Raviv Bar-Tal 2021-06-14 11:25:41 UTC
BZ was verified by the ACM team on baremetal.

Comment 16 errata-xmlrpc 2021-07-27 22:48:26 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438


Note You need to log in before you can comment on or make changes to this bug.