Created attachment 1791564 [details] Must gather Description of problem: ocp 4.8.0-fc.9 deployment failed (idrac+vritual media) due to following error: {"level":"info","ts":1623681701.1185718,"logger":"controllers.BareMetalHost","msg":"inspecting hardware","baremetalhost":"openshift-machine-api/hlxcl2-worker-0","provisioningState":"inspecting"} {"level":"info","ts":1623681701.118577,"logger":"controllers.BareMetalHost","msg":"inspecting hardware","baremetalhost":"openshift-machine-api/hlxcl2-worker-0","provisioningState":"inspecting"} {"level":"info","ts":1623681701.1185808,"logger":"provisioner.ironic","msg":"inspecting hardware","host":"openshift-machine-api~hlxcl2-worker-0"} {"level":"info","ts":1623681701.1474018,"logger":"provisioner.ironic","msg":"updating boot mode before hardware inspection","host":"openshift-machine-api~hlxcl2-worker-0"} {"level":"error","ts":1623681702.8093436,"logger":"controller-runtime.manager.controller.baremetalhost","msg":"Reconciler error","reconciler group":"metal3.io","reconciler kind":"BareMetalHost","name":"hlxcl2-worker-0","namespace":"openshift-machine-api","error":"action \"inspecting\" failed: hardware inspection failed: failed to update host boot mode settings in ironic: Internal Server Error","errorVerbose":"Internal Server Error\nfailed to update host boot mode settings in ironic\ngithub.com/metal3-io/baremetal-operator/pkg/provisioner/ironic.(*ironicProvisioner).InspectHardware\n\t/go/src/github.com/metal3-io/baremetal-operator/pkg/provisioner/ironic/ironic.go:708\ngithub.com/metal3-io/baremetal-operator/controllers/metal3%2eio.(*BareMetalHostReconciler).actionInspecting\n\t/go/src/github.com/metal3-io/baremetal-operator/controllers/metal3.io/baremetalhost_controller.go:671\ngithub.com/metal3-io/baremetal-operator/controllers/metal3%2eio.(*hostStateMachine).handleInspecting\n\t/go/src/github.com/metal3-io/baremetal-operator/controllers/metal3.io/host_state_machine.go:360\ngithub.com/metal3-io/baremetal-operator/controllers/metal3%2eio.(*hostStateMachine).ReconcileState\n\t/go/src/github.com/metal3-io/baremetal-operator/controllers/metal3.io/host_state_machine.go:199\ngithub.com/metal3-io/baremetal-operator/controllers/metal3%2eio.(*BareMetalHostReconciler).Reconcile\n\t/go/src/github.com/metal3-io/baremetal-operator/controllers/metal3.io/baremetalhost_controller.go:235\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:263\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:235\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1.1\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:198\nk8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext.func1\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:185\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:155\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:156\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133\nk8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:185\nk8s.io/apimachinery/pkg/util/wait.UntilWithContext\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:99\nruntime.goexit\n\t/usr/lib/golang/src/runtime/asm_amd64.s:1371\nhardware inspection failed\ngithub.com/metal3-io/baremetal-operator/controllers/metal3%2eio.(*BareMetalHostReconciler).actionInspecting\n\t/go/src/github.com/metal3-io/baremetal-operator/controllers/metal3.io/baremetalhost_controller.go:678\ngithub.com/metal3-io/baremetal-operator/controllers/metal3%2eio.(*hostStateMachine).handleInspecting\n\t/go/src/github.com/metal3-io/baremetal-operator/controllers/metal3.io/host_state_machine.go:360\ngithub.com/metal3-io/baremetal-operator/controllers/metal3%2eio.(*hostStateMachine).ReconcileState\n\t/go/src/github.com/metal3-io/baremetal-operator/controllers/metal3.io/host_state_machine.go:199\ngithub.com/metal3-io/baremetal-operator/controllers/metal3%2eio.(*BareMetalHostReconciler).Reconcile\n\t/go/src/github.com/metal3-io/baremetal-operator/controllers/metal3.io/baremetalhost_controller.go:235\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:263\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:235\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1.1\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:198\nk8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext.func1\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:185\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:155\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:156\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133\nk8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:185\nk8s.io/apimachinery/pkg/util/wait.UntilWithContext\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:99\nruntime.goexit\n\t/usr/lib/golang/src/runtime/asm_amd64.s:1371\naction \"inspecting\" failed\ngithub.com/metal3-io/baremetal-operator/controllers/metal3%2eio.(*BareMetalHostReconciler).Reconcile\n\t/go/src/github.com/metal3-io/baremetal-operator/controllers/metal3.io/baremetalhost_controller.go:239\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:263\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:235\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1.1\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:198\nk8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext.func1\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:185\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:155\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:156\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133\nk8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:185\nk8s.io/apimachinery/pkg/util/wait.UntilWithContext\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:99\nruntime.goexit\n\t/usr/lib/golang/src/runtime/asm_amd64.s:1371","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/github.com/go-logr/zapr/zapr.go:132\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:267\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:235\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1.1\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:198\nk8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext.func1\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:185\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:155\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:156\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133\nk8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:185\nk8s.io/apimachinery/pkg/util/wait.UntilWithContext\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:99"} {"level":"info","ts":1623681703.8316412,"logger":"controllers.BareMetalHost","msg":"start","baremetalhost":"openshift-machine-api/hlxcl2-worker-1"} {"level":"info","ts":1623681703.9150536,"logger":"controllers.BareMetalHost","msg":"registering and validating access to management controller","baremetalhost":"openshift-machine-api/hlxcl2-worker-1","provisioningState":"inspecting","credentials":{"credentials":{"name":"hlxcl2-worker-1-bmc-secret","namespace":"openshift-machine-api"},"credentialsVersion":"17565"}} {"level":"info","ts":1623681703.940474,"logger":"provisioner.ironic","msg":"current provision state","host":"openshift-machine-api~hlxcl2-worker-1","lastError":"Failed to inspect hardware. Reason: unable to start inspection: Failed to download image http://localhost:6181/images/ironic-python-agent.kernel, reason: HTTPConnectionPool(host='localhost', port=6181): Max retries exceeded with url: /images/ironic-python-agent.kernel (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f1c33396048>: Failed to establish a new connection: [Errno 111] ECONNREFUSED',))","current":"inspect failed","target":"manageable"} Version-Release number of selected component (if applicable): ocp 4.8.0-fc.9 3 VM masters 2 BM worker (Dell R740) How reproducible: Trigger ocp 4.8.0-fc.9 installation. Actual results: 3 masters are UP. 2 BM workers stuck due to ironic issue. OCP installation failed Expected results: OCP installed successfully on cluster Additional info: There is a workaround. The only way to fix this connection error is to restart metal3 pod. After metal3 restarted deployment started working as expected. Some times we need to configure Provisioning ip manually along with metal3 pod restart. We noticed that provisioning ip could disappear randomly. Mustgather logs attached.
Your baremetal operator is failing with this error 2021-06-15T14:18:25.536730720Z {"level":"error","ts":1623766705.5358071,"logger":"controller-runtime.manager.controller.baremetalhost","msg":"Reconciler error","reconciler group":"metal3.io","reconciler kind":"BareMetalHost","name":"hlxcl2-worker-1","namespace":"openshift-machine-api","error":"action \"inspecting\" failed: hardware inspection failed: failed to update host boot mode settings in ironic: Internal Server Error","errorVerbose":"Internal Server Error\nfailed to update host boot mode settings in ironic this corresponds with this error in your ironic-api log 2021-06-15T14:18:25.534677648Z 2021-06-15 14:18:25.533 52 ERROR ironic.api.method [req-9e364490-e81c-4783-b495-a3b1dc14b39f ironic-user - - - -] Server-side error: "Unable to establish connection to https://10.46.55.124:8089: HTTPSConnectionPool(host='10.46.55.124', port=8089): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f52f1c1ee80>: Failed to establish a new connection: [Errno 113] EHOSTUNREACH',))". Detail: The reason I think is that your your metal3 pod has a container to set the provisioning IP "metal3-static-ip-set" but your missing the container that ensures that the IP isn't lost over time "metal3-static-ip-manager" Looking at the CBO code "metal3-static-ip-set" is included if you have a provisioning IP set[1] but "metal3-static-ip-manager" is only added if you have both a provisioning IP and have set and ProvisioningNetwork is not Disabled this seems inconsistent as if you need one you need the other you have provisioningHostIP: 10.46.55.124 provisioningNetwork: Disabled So when the pod starts "metal3-static-ip-set" assigns an IP to the provisioning nic but you have no "metal3-static-ip-manager" to keep it there If you don't need it, I think unsetting "provisioningHostIP" in your install-config should allow your workers to deploy (the external IP will be used by ironic) Can you confirm if this works, then we can work on fixing the inconsistencies in the metal3 pod 1 - https://github.com/openshift/cluster-baremetal-operator/blob/04a2ae2/provisioning/baremetal_pod.go#L238-L240 2 - https://github.com/openshift/cluster-baremetal-operator/blob/04a2ae2/provisioning/baremetal_pod.go#L344-L346
Hi Derek, We tried to unset "provisioningHostIP" and it solved the issue. Without "provisioningHostIP" deployment completed successfully.
Verified on 4.8.0-rc.1
(In reply to elevin from comment #5) > Verified on 4.8.0-rc.1 The fix for this hasn't yet merged into 4.8, it needs to be verified on 4.9, can you verify there.
*** Bug 1991568 has been marked as a duplicate of this bug. ***
4.9.0-fc.0 deployed successfully
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:3759
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days