Created attachment 1724026 [details] metal3 log Description of problem: While trying to delete node via: oc delete bmh openshift-worker-0-2 -n openshift-machine-api The session hangs. In order to see the logs: oc get pods -A |grep metal oc logs metal3*** -n openshift-machine-api -c metal3-ironic-conductor The logs show: 2020-10-25 10:09:44.630 1 INFO ironic.conductor.manager [req-42f09b97-3b7e-4def-89d4-56e04392f7b0 ironic-user - - - -] Successfully deleted node b33c547f-adae-4b4e-9d10-7c5c54b84863. Version-Release number of selected component (if applicable): Client Version: 4.6.0-0.nightly-2020-10-03-051134 Server Version: 4.6.0-rc.4 Kubernetes Version: v1.19.0+d59ce34 How reproducible: I've noticed this problem while adding new worker through this [1] procedure, By mistake I left the wrong characters with the address "<" ">" [2] In order to correct this mistake - I've tried to delete the new worker [3] And the session hangs. [1] https://polarion.engineering.redhat.com/polarion/#/project/OSE/workitem?id=OCP-29807 [2]address: <redfish://192.168.123.1:8000/redfish/v1/Systems/e2e8a52d-1012-4eec-a22b-dfd57f0df50b> [3]oc delete bmh openshift-worker-0-2 -n openshift-machine-api baremetalhost.metal3.io "openshift-worker-0-2" deleted Steps to Reproduce: 1. 2. 3. Actual results: Session hangs & The node is not deleted. (oc get bmh -n openshift-machine-api) Expected results The node should be deleted & session should not hangs. Additional info:
link to must-gather: https://drive.google.com/drive/folders/1pdoRJ92mcz7_TWBRKH_P-SkTPFQV9Frb?usp=sharing
Logs show it failing with this error: 2020-10-29T10:39:43.030 Reconciling BareMetalHost {Request.Namespace: 'openshift-machine-api', Request.Name: 'openshift-worker-0-2'} 2020-10-29T10:39:43.030 Fetching Status from Annotation {Request.Namespace: 'openshift-machine-api', Request.Name: 'openshift-worker-0-2'} 2020-10-29T10:39:43.030 No status cache found {Request.Namespace: 'openshift-machine-api', Request.Name: 'openshift-worker-0-2'} 2020-10-29T10:39:43.030 adding finalizer {Request.Namespace: 'openshift-machine-api', Request.Name: 'openshift-worker-0-2', existingFinalizers: [], newValue: 'baremetalhost.metal3.io'} 2020-10-29T10:39:43.042 Reconciling BareMetalHost {Request.Namespace: 'openshift-machine-api', Request.Name: 'openshift-worker-0-2'} 2020-10-29T10:39:43.042 Fetching Status from Annotation {Request.Namespace: 'openshift-machine-api', Request.Name: 'openshift-worker-0-2'} 2020-10-29T10:39:43.042 No status cache found {Request.Namespace: 'openshift-machine-api', Request.Name: 'openshift-worker-0-2'} 2020-10-29T10:39:43.042 Reconciler error {controller: 'metal3-baremetalhost-controller', request: 'openshift-machine-api/openshift-worker-0-2', error: 'failed to create provisioner: failed to parse BMC address information: failed to parse BMC address information: parse "<redfish://192.168.123.1:8000/redfish/v1/Systems/e2e8a52d-1012-4eec-a22b-dfd57f0df50b>": first path segment in URL cannot contain colon'} Basically, because we can't figure out any valid driver for this URL, we fail and then never get to run the code that would remove the finalizer. It just keeps hitting this error. In the medium term, we should institute a webhook that prevents invalid stuff like this being set. But in the short term we should probably do something like just go ahead and remove the finalizer if we can't create a BMC and the DeletionTimestamp is set. Workarounds would be to update the Host to have a correct address (possible since we haven't yet implemented a webhook to prevent changing the address either) or manually remove the finalizer.
Issue will be fixed by the upstream PR https://github.com/metal3-io/baremetal-operator/pull/838 in conjunction with @Zane's commit: https://github.com/metal3-io/baremetal-operator/commit/beea4d0ead807a8f19b38d538db3502ee3504b97. Changes will be ported downstream by PR https://github.com/openshift/baremetal-operator/pull/142
verified on: Client Version: 4.8.0-0.nightly-2021-06-13-101614 Server Version: 4.8.0-0.nightly-2021-07-09-181248 Kubernetes Version: v1.21.1+f36aa36 bmh is created but shows registration error: openshift-machine-api openshift-worker-0-2 registering true registration error bmh deletes without a problem
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438