Bug 1905577 - Control plane machines not adopted when provisioning network is disabled
Summary: Control plane machines not adopted when provisioning network is disabled
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Bare Metal Hardware Provisioning
Version: 4.7
Hardware: Unspecified
OS: Unspecified
medium
high
Target Milestone: ---
: 4.8.0
Assignee: Stephen Benjamin
QA Contact: Lubov
URL:
Whiteboard:
Depends On:
Blocks: 1932452
TreeView+ depends on / blocked
 
Reported: 2020-12-08 15:22 UTC by Stephen Benjamin
Modified: 2021-07-27 22:35 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: Adoption of externally provisioned hosts was not retried upon failure. In some cases a race could occur where we try to adopt before the image cache is populated, resulting in permanent adoption failure. Consequence: Control plane bare metal hosts report "adoption failed." Fix: We now retry on adoption failure. Result: Control plane hosts are correctly adopted.
Clone Of:
: 1932452 (view as bug list)
Environment:
Last Closed: 2021-07-27 22:34:40 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift baremetal-operator pull 125 0 None closed Merge upstream 2021-02-11 2021-02-24 16:07:06 UTC
Red Hat Product Errata RHSA-2021:2438 0 None None None 2021-07-27 22:35:11 UTC

Description Stephen Benjamin 2020-12-08 15:22:59 UTC
openshift-machine-api   ostest-master-2   error    externally provisioned   ostest-7rtdq-master-2         redfish-virtualmedia+http://192.168.111.1:8000/redfish/v1/Systems/4f149ee8-7d13-483c-978a-81038234c5b5                      true     Host adoption failed: Error while attempting to adopt node 7de2e2ff-6984-4c4b-a127-bb5cf38037df: Validation of image href http://192.168.111.5:6181/images/rhcos-47.83.202012030221-0-openstack.x86_64.qcow2/rhcos-47.83.202012030221-0-compressed.x86_64.qcow2 failed, reason: HTTPConnectionPool(host='192.168.111.5', port=6181): Max retries exceeded with url: /images/rhcos-47.83.202012030221-0-openstack.x86_64.qcow2/rhcos-47.83.202012030221-0-compressed.x86_64.qcow2 (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f0b9f799048>: Failed to establish a new connection: [Errno 111] ECONNREFUSED',)).

Comment 3 Stephen Benjamin 2021-02-01 14:00:36 UTC
This is already fixed upstream, https://github.com/metal3-io/baremetal-operator/pull/762.  I'm working on putting together an OpenShift PR but it's a bit challenging since BMO upstream has moved quite a bit ahead of OpenShift's 4.7 version.

Comment 4 Stephen Benjamin 2021-02-01 17:30:38 UTC
Tentatively the plan is to get the fix in the first 4.7 z-Stream, we'll have more time to let the changes soak in CI.

Comment 7 Lubov 2021-02-25 18:02:47 UTC
verified on 4.8.0-0.nightly-2021-02-25-112922 on a setup where the problem was reproducible 100%

Comment 10 errata-xmlrpc 2021-07-27 22:34:40 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438


Note You need to log in before you can comment on or make changes to this bug.