Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1891301

Summary: Deleting bmh by "oc delete bmh' get stuck
Product: OpenShift Container Platform Reporter: Nataf Sharabi <nsharabi>
Component: Bare Metal Hardware ProvisioningAssignee: Andrea Fasano <afasano>
Bare Metal Hardware Provisioning sub component: baremetal-operator QA Contact: Adina Wolff <awolff>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: medium CC: beth.white, kiran, lshilin, stbenjam, ykashtan, yprokule
Version: 4.8Keywords: Triaged
Target Milestone: ---Flags: nsharabi: needinfo+
Target Release: 4.8.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-07-27 22:33:58 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
metal3 log none

Description Nataf Sharabi 2020-10-25 12:42:36 UTC
Created attachment 1724026 [details]
metal3 log

Description of problem:

While trying to delete node via:

oc delete bmh openshift-worker-0-2 -n openshift-machine-api

The session hangs.


In order to see the logs:

  oc get pods -A |grep metal
  oc logs metal3*** -n openshift-machine-api -c metal3-ironic-conductor


The logs show:
2020-10-25 10:09:44.630 1 INFO ironic.conductor.manager [req-42f09b97-3b7e-4def-89d4-56e04392f7b0 ironic-user - - - -] Successfully deleted node b33c547f-adae-4b4e-9d10-7c5c54b84863.





Version-Release number of selected component (if applicable):
Client Version: 4.6.0-0.nightly-2020-10-03-051134
Server Version: 4.6.0-rc.4
Kubernetes Version: v1.19.0+d59ce34

How reproducible:
I've noticed this problem while adding new worker through this [1] procedure,

By mistake I left the wrong characters with the address "<" ">" [2]

In order to correct this mistake - I've tried to delete the new worker [3]

And the session hangs.

[1] https://polarion.engineering.redhat.com/polarion/#/project/OSE/workitem?id=OCP-29807

[2]address: <redfish://192.168.123.1:8000/redfish/v1/Systems/e2e8a52d-1012-4eec-a22b-dfd57f0df50b>

[3]oc delete bmh openshift-worker-0-2 -n openshift-machine-api
baremetalhost.metal3.io "openshift-worker-0-2" deleted


Steps to Reproduce:
1.
2.
3.

Actual results:

Session hangs & The node is not deleted.
(oc get bmh -n openshift-machine-api)

Expected results
The node should be deleted & session should not hangs.

Additional info:

Comment 3 Nataf Sharabi 2020-10-29 11:38:23 UTC
link to must-gather:
https://drive.google.com/drive/folders/1pdoRJ92mcz7_TWBRKH_P-SkTPFQV9Frb?usp=sharing

Comment 4 Zane Bitter 2020-11-03 18:45:56 UTC
Logs show it failing with this error:

2020-10-29T10:39:43.030 Reconciling BareMetalHost {Request.Namespace: 'openshift-machine-api', Request.Name: 'openshift-worker-0-2'}
2020-10-29T10:39:43.030 Fetching Status from Annotation {Request.Namespace: 'openshift-machine-api', Request.Name: 'openshift-worker-0-2'}
2020-10-29T10:39:43.030 No status cache found {Request.Namespace: 'openshift-machine-api', Request.Name: 'openshift-worker-0-2'}
2020-10-29T10:39:43.030 adding finalizer {Request.Namespace: 'openshift-machine-api', Request.Name: 'openshift-worker-0-2', existingFinalizers: [], newValue: 'baremetalhost.metal3.io'}
2020-10-29T10:39:43.042 Reconciling BareMetalHost {Request.Namespace: 'openshift-machine-api', Request.Name: 'openshift-worker-0-2'}
2020-10-29T10:39:43.042 Fetching Status from Annotation {Request.Namespace: 'openshift-machine-api', Request.Name: 'openshift-worker-0-2'}
2020-10-29T10:39:43.042 No status cache found {Request.Namespace: 'openshift-machine-api', Request.Name: 'openshift-worker-0-2'}
2020-10-29T10:39:43.042 Reconciler error {controller: 'metal3-baremetalhost-controller', request: 'openshift-machine-api/openshift-worker-0-2', error: 'failed to create provisioner: failed to parse BMC address information: failed to parse BMC address information: parse "<redfish://192.168.123.1:8000/redfish/v1/Systems/e2e8a52d-1012-4eec-a22b-dfd57f0df50b>": first path segment in URL cannot contain colon'}

Basically, because we can't figure out any valid driver for this URL, we fail and then never get to run the code that would remove the finalizer. It just keeps hitting this error.

In the medium term, we should institute a webhook that prevents invalid stuff like this being set. But in the short term we should probably do something like just go ahead and remove the finalizer if we can't create a BMC and the DeletionTimestamp is set.

Workarounds would be to update the Host to have a correct address (possible since we haven't yet implemented a webhook to prevent changing the address either) or manually remove the finalizer.

Comment 5 Andrea Fasano 2021-04-07 14:42:11 UTC
Issue will be fixed by the upstream PR https://github.com/metal3-io/baremetal-operator/pull/838 in conjunction with @Zane's commit: https://github.com/metal3-io/baremetal-operator/commit/beea4d0ead807a8f19b38d538db3502ee3504b97.

Changes will be ported downstream by PR https://github.com/openshift/baremetal-operator/pull/142

Comment 9 Adina Wolff 2021-07-12 07:57:10 UTC
verified on:
Client Version: 4.8.0-0.nightly-2021-06-13-101614
Server Version: 4.8.0-0.nightly-2021-07-09-181248
Kubernetes Version: v1.21.1+f36aa36


bmh is created but shows registration error: 

openshift-machine-api   openshift-worker-0-2   registering                                                        true     registration error


bmh deletes without a problem

Comment 12 errata-xmlrpc 2021-07-27 22:33:58 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438