Bug 1859999 - Baremetal operator is trying to validate image checksum for externally provisioned hosts
Summary: Baremetal operator is trying to validate image checksum for externally provis...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Bare Metal Hardware Provisioning
Version: 4.5
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.7.0
Assignee: Egor Lunin
QA Contact: Amit Ugol
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-07-23 13:52 UTC by Nir
Modified: 2020-11-02 14:06 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-11-02 14:06:06 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github metal3-io baremetal-operator pull 659 0 None closed Bug 1859999: Expand API documentation 2020-11-18 17:54:13 UTC

Description Nir 2020-07-23 13:52:23 UTC
Description of problem:
While baremetal operator is trying to adopt externally provisioned bmh, it is trying to validate the image against the image checksum. 
BMO should not do it, as externally provisioned hosts should not be provisioned with any image.

Version-Release number of selected component (if applicable):
4.5


Steps to Reproduce:
1. create bmh CR with image and image checksum url that doesn't exist and with externallyProvisioned: true
2. wait until error appears in oc get bmh

Actual results:
Error, e.g.:
Host adoption failed: Error while attempting to adopt node 64f7d62b-db72-431a-8f5e-2dfa2ce75553: Validation of image href http://172.22.0.3:6180/images/rhcos-44.81.202004250133-0-openstack.x86_64.qcow2/rhcos-44.81.202004250133-0-compressed.x86_64.qcow2 failed, reason: HTTPConnectionPool(host='172.22.0.3', port=6180): Max retries exceeded with url: /images/rhcos-44.81.202004250133-0-openstack.x86_64.qcow2/rhcos-44.81.202004250133-0-compressed.x86_64.qcow2 (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f15ddccb780>: Failed to establish a new connection: [Errno 110] ETIMEDOUT',)).

Expected results:
No error, and host will be in externally provisioned state

Comment 1 Steven Hardy 2020-07-27 14:08:58 UTC
You shouldn't specify any image when using externallyProvisioned: true - it should be left empty, do we have some docs somewhere saying to add dummy data?

Perhaps the BMO should ignore the data (or, ideally reject the BMH definition as invalid), but I want to clarify if this is a corner case or if we have some docs/automation doing the wrong thing?

Comment 2 Nir 2020-07-28 08:51:28 UTC
Thanks. That makes sense.
Perhaps it worth adding some note in externallyProvisioned or in image section?
https://github.com/metal3-io/baremetal-operator/blob/master/docs/api.md

Comment 4 Doug Hellmann 2020-08-07 17:41:56 UTC
The image setting is necessary in order for Ironic to consider the host in a state where it will monitor the power state.

I have some work in progress to update the baremetal-operator so it waits to register the host with Ironic until after the image details are provided, so the error message explains what is wrong in terms that a metal3 user will understand. https://github.com/metal3-io/baremetal-operator/pull/609

It would also be useful to consider changes in Ironic to allow it to monitor the power of a host even if the host cannot be reprovisioned if there is some sort of failure.

Comment 5 Doug Hellmann 2020-08-07 17:43:39 UTC
That pull request is linked to https://bugzilla.redhat.com/show_bug.cgi?id=1864327

Comment 6 Doug Hellmann 2020-08-07 17:52:34 UTC
(In reply to Steven Hardy from comment #1)
> You shouldn't specify any image when using externallyProvisioned: true - it
> should be left empty, do we have some docs somewhere saying to add dummy
> data?
> 
> Perhaps the BMO should ignore the data (or, ideally reject the BMH
> definition as invalid), but I want to clarify if this is a corner case or if
> we have some docs/automation doing the wrong thing?

The IPI installer creates the control plane hosts without image details, but after the cluster forms they do have those settings. I think CAPBM is updating the image details when it links the host and machine resources for the control plane. As far as I can tell, we never see an error because it takes far longer for the metal3 pod to start than CAPBM, so CAPBM is always winning the race today.


Note You need to log in before you can comment on or make changes to this bug.