Bug 1972753
Summary: | ironic hardware inspection failed due to NewConnectionError causes bm nodes stuck | ||||||
---|---|---|---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Nikita <nkononov> | ||||
Component: | Bare Metal Hardware Provisioning | Assignee: | Derek Higgins <derekh> | ||||
Bare Metal Hardware Provisioning sub component: | ironic | QA Contact: | elevin | ||||
Status: | CLOSED ERRATA | Docs Contact: | jfrye | ||||
Severity: | high | ||||||
Priority: | low | CC: | akaris, derekh, elevin, jfrye, lshilin | ||||
Version: | 4.8 | Keywords: | Triaged | ||||
Target Milestone: | --- | ||||||
Target Release: | 4.9.0 | ||||||
Hardware: | All | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: |
Previously if provisioningHostIP had been set it was being assigned to the metal3 pod even in cases where the provisioning network had been disabled. This no longer happens.
|
Story Points: | --- | ||||
Clone Of: | |||||||
: | 1975711 (view as bug list) | Environment: | |||||
Last Closed: | 2021-10-18 17:34:35 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 1975711 | ||||||
Attachments: |
|
Description
Nikita
2021-06-16 14:53:41 UTC
Your baremetal operator is failing with this error 2021-06-15T14:18:25.536730720Z {"level":"error","ts":1623766705.5358071,"logger":"controller-runtime.manager.controller.baremetalhost","msg":"Reconciler error","reconciler group":"metal3.io","reconciler kind":"BareMetalHost","name":"hlxcl2-worker-1","namespace":"openshift-machine-api","error":"action \"inspecting\" failed: hardware inspection failed: failed to update host boot mode settings in ironic: Internal Server Error","errorVerbose":"Internal Server Error\nfailed to update host boot mode settings in ironic this corresponds with this error in your ironic-api log 2021-06-15T14:18:25.534677648Z 2021-06-15 14:18:25.533 52 ERROR ironic.api.method [req-9e364490-e81c-4783-b495-a3b1dc14b39f ironic-user - - - -] Server-side error: "Unable to establish connection to https://10.46.55.124:8089: HTTPSConnectionPool(host='10.46.55.124', port=8089): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f52f1c1ee80>: Failed to establish a new connection: [Errno 113] EHOSTUNREACH',))". Detail: The reason I think is that your your metal3 pod has a container to set the provisioning IP "metal3-static-ip-set" but your missing the container that ensures that the IP isn't lost over time "metal3-static-ip-manager" Looking at the CBO code "metal3-static-ip-set" is included if you have a provisioning IP set[1] but "metal3-static-ip-manager" is only added if you have both a provisioning IP and have set and ProvisioningNetwork is not Disabled this seems inconsistent as if you need one you need the other you have provisioningHostIP: 10.46.55.124 provisioningNetwork: Disabled So when the pod starts "metal3-static-ip-set" assigns an IP to the provisioning nic but you have no "metal3-static-ip-manager" to keep it there If you don't need it, I think unsetting "provisioningHostIP" in your install-config should allow your workers to deploy (the external IP will be used by ironic) Can you confirm if this works, then we can work on fixing the inconsistencies in the metal3 pod 1 - https://github.com/openshift/cluster-baremetal-operator/blob/04a2ae2/provisioning/baremetal_pod.go#L238-L240 2 - https://github.com/openshift/cluster-baremetal-operator/blob/04a2ae2/provisioning/baremetal_pod.go#L344-L346 Hi Derek, We tried to unset "provisioningHostIP" and it solved the issue. Without "provisioningHostIP" deployment completed successfully. Verified on 4.8.0-rc.1 (In reply to elevin from comment #5) > Verified on 4.8.0-rc.1 The fix for this hasn't yet merged into 4.8, it needs to be verified on 4.9, can you verify there. *** Bug 1991568 has been marked as a duplicate of this bug. *** 4.9.0-fc.0 deployed successfully Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:3759 The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days |